Key Phrase Extraction of Lightly Filtered Broadcast News

  • Luís Marujo
  • Ricardo Ribeiro
  • David Martins de Matos
  • João P. Neto
  • Anatole Gershman
  • Jaime Carbonell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7499)


This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis. Elimination of as little as 10% of the document sentences lead to a 2% improvement in AKE precision and recall. AKE is built over MAUI toolkit that follows a supervised learning approach. We trained and tested our AKE method on a gold standard made of 8 BN programs containing 110 manually annotated news stories. The experiments were conducted within a Multimedia Monitoring Solution (MMS) system for TV and radio news/programs, running daily, and monitoring 12 TV and 4 radio channels.


Keyphrase extraction Speech summarization Speech browsing Broadcast News speech recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Carbonell, J., Goldstein, J.: The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In: ACM SIGIR 1998, pp. 335–336 (1998)Google Scholar
  2. 2.
    Chien, L.: Pat-tree-based keyword extraction for Chinese information retrieval. In: ACM SIGIR 1997, pp. 50–58. ACM, New York (1997)CrossRefGoogle Scholar
  3. 3.
    Cohen, J.D.: Highlights: Language- and Domain-Independent Indexing Terms for Abstracting Automatic. English 46(3), 162–174 (1995)Google Scholar
  4. 4.
    Ercan, G., Cicekli, I.: Using lexical chains for keyword extraction. Information Processing & Management 43(6), 1705–1714 (2007); text SummarizationCrossRefGoogle Scholar
  5. 5.
    Garg, N., Favre, B., Reidhammer, K., Hakkani-Tür, D.: ClusterRank: A Graph Based Method for Meeting Summarization. In: Interspeech 2009, pp. 1499–1502. ISCA (2009)Google Scholar
  6. 6.
    Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: ACM SIGIR 2001, pp. 19–25. ACM (2001)Google Scholar
  7. 7.
    Harabagiu, S., Lacatusu, F.: Topic Themes for Multi-Document Summarization. In: ACM SIGIR 2005, pp. 202–209. ACM (2005)Google Scholar
  8. 8.
    Hasan, K., Ng, V.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: ACL 2010, pp. 365–373. ACL (2010)Google Scholar
  9. 9.
    Hulth, A., Karlgren, J., Jonsson, A., Boström, H., Asker, L.: Automatic Keyword Extraction Using Domain Knowledge. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, p. 472. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  10. 10.
    Lin, S.H., Yeh, Y.M., Chen, B.: Extractive Speech Summarization – From the View of Decision Theory. In: Proceedings of Interspeech 2010. ISCA (2010)Google Scholar
  11. 11.
    Marujo, L., Coheur, L., Trancoso, I.: Keyphrase Cloud Generation of Broadcast News. In: Interspeech 2011. ISCA (September 2011)Google Scholar
  12. 12.
    Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-ocurrence statistical information. Inter. Journal on A.I. Tools 13, 157–170 (2004)CrossRefGoogle Scholar
  13. 13.
    Medelyan, O., Perrone, V., Witten, I.H.: Subject metadata support powered by Maui. In: Proceedings of JCDL 2010, p. 407. ACM, New York (2010)Google Scholar
  14. 14.
    Neto, J., Meinedo, H., Viveiros, M.: A media monitoring solution. In: Proceedings of ICASSP 2011, Prague, Czech Republic (2011)Google Scholar
  15. 15.
    Penn, G., Zhu, X.: A Critical Reassessment of Evaluation Baselines for Speech Summarization. In: Proceeding of ACL 2008: HLT, pp. 470–478. ACL (2008)Google Scholar
  16. 16.
    Ribeiro, R., de Matos, D.M.: Revisiting Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity. Journal of A.I. Research 42, 275–308 (2011)Google Scholar
  17. 17.
    Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. Tech. rep., Ithaca, NY, USA (1974)Google Scholar
  18. 18.
    Sarkar, K., Nasipuri, M., Ghose, S.: A new approach to keyphrase extraction using neural networks. Inter. Journal of Computer Science Issues 7(2,3), 16–25 (2010)Google Scholar
  19. 19.
    Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: KEA: Practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 254–255. ACM (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Luís Marujo
    • 1
    • 2
    • 4
  • Ricardo Ribeiro
    • 2
    • 3
  • David Martins de Matos
    • 2
    • 4
  • João P. Neto
    • 2
    • 4
  • Anatole Gershman
    • 1
  • Jaime Carbonell
    • 1
  2. 2.L2F - INESC ID LisboaPortugal
  3. 3.Instituto Universitário de Lisboa (ISCTE-IUL)Portugal
  4. 4.Instituto Superior TécnicoUniversidade Técnica de LisboaPortugal

Personalised recommendations