Multimedia Tools and Applications

, Volume 68, Issue 2, pp 401–412 | Cite as

Contextual keyword extraction by building sentences with crowdsourcing

  • Soon Gill Hong
  • Sungho Shin
  • Mun Yong Yi


Automatic keyword extraction from documents has long been used and proven its usefulness in various areas. Crowdsourced tagging for multimedia resources has emerged and looks promising to a certain extent. Automatic approaches for unstructured data, automatic keyword extraction and crowdsourced tagging are efficient but they all suffer from the lack of contextual understanding. In this paper, we propose a new model of extracting key contextual terms from unstructured data, especially from documents, with crowdsourcing. The model consists of four sequential processes: (1) term selection by frequency, (2) sentence building, (3) revised term selection reflecting the newly built sentences, and (4) sentence voting. Online workers read only a fraction of a document and participated in sentence building and sentence voting processes, and key sentences were generated as a result. We compared the generated sentences to the keywords entered by the author and to the sentences generated by offline workers who read the whole document. The results support the idea that sentence building process can help selecting terms with more contextual meaning, closing the gap between keywords from automated approaches and contextual understanding required by humans.


Crowdsourcing Keyword extraction Document summary Content extraction Sentence building Contextual term extraction 



This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2011‐0029185).


  1. 1.
    Barker K, Cornacchia N (2000) Using noun phrase heads to extract document keyphrases. Adv Artif Intell 40–52. SpringerGoogle Scholar
  2. 2.
    Bernstein MS, Little G, Miller RC, Hartmann B, Ackerman MS, Karger DR, Crowell D, Panovich K (2010) Soylent: a word processor with a crowd inside. Proceedings of the 23nd annual ACM symposium on User interface software and technology, 313–322. ACMGoogle Scholar
  3. 3.
    Frank E, Paynter G, Witten I, Gutwin C, Nevill-Manning CG (1999) Domain-specific keyphrase extraction. International Joint Conference on Artificial Intelligence 16:668–673Google Scholar
  4. 4.
    Howe J (2006) The rise of crowdsourcing. Wired magazine 14(6):1–4MathSciNetGoogle Scholar
  5. 5.
    Hsu W, Mei T, Yan R (2008) Knowledge discovery over community-sharing media: from signal to intelligence. Proceedings of the 17th international conference on World Wide Web, 665-674Google Scholar
  6. 6.
    Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. Proceedings of the 2003 conference on Empirical methods in natural language processing, 216-223. Morristown, NJ, USA: Association for Computational LinguisticsGoogle Scholar
  7. 7.
    Hulth A, Karlgren J, Jonsson A, Boström H, Asker L (2001) Automatic keyword extraction using domain knowledge. Comput Linguist Intell Text Process 472–482. SpringerGoogle Scholar
  8. 8.
    Kittur BA (2010) CrowdSourcing, collaboration and creativity. XRDS: Crossroads The ACM Magazine for Students 17(2):22–26CrossRefGoogle Scholar
  9. 9.
    Matsuo Y, Ishizuka M (2004) Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools 13:157–170CrossRefGoogle Scholar
  10. 10.
    Riste Gligorov (2012) User generated metadata in audio-visual collections. Proceedings of the 21st international conference, 139–143Google Scholar
  11. 11.
    Shaw AD, Horton JJ, Chen DL (2011) Designing Incentives for Inexpert Human Raters. Proceedings of the ACM conference on Computer supported cooperative workGoogle Scholar
  12. 12.
    Snow R, Connor BO, Jurafsky D, Ng AY, Labs D, St C (2008) Cheap and fast — but is it good? Evaluating non-expert annotations for natural language tasks. Comput Linguist 254–263Google Scholar
  13. 13.
    Turney P (2000) Learning algorithms for keyphrase extraction. Information Retrieval 2(4):303–336CrossRefGoogle Scholar
  14. 14.
    Von Ahn L, Dabbish L (2008) Designing games with a purpose. Communications of the ACM 51(8):57CrossRefGoogle Scholar
  15. 15.
    Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG (1999) KEA: practical automatic keyphrase extraction. Proceedings of the fourth ACM conference on Digital libraries, 254–255Google Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  1. 1.Department of Knowledge Service EngineeringKAISTDaejeonRepublic of Korea
  2. 2.Korea Institute of Science and Technology InformationDaejeonRepublic of Korea

Personalised recommendations