Automated Coding of Decision Support Variables

  • Massimiliano Albanese
  • Marat Fayzullin
  • Jana Shakarian
  • V. S. Subrahmanian
Chapter

Abstract

With the enormous amount of textual information now available online, there is an increasing demand – especially in the national security community – for tools capable of automatically extracting certain types of information from massive amounts of raw data. In the last several years, ad-hoc Information Extraction (IE) systems have been developed to help address this need [6]. However, there are applications where the types of questions that need to be answered are far more complex than those that traditional IE systems can handle, and require to integrate information from several sources. For instance, political scientists need to monitor political organizations and conflicts, while defense and security analysts need to monitor terrorist groups. Typically, political scientists and analysts define a long list of variables – referred to as “codebook” – that they want to monitor over time for a number of groups. Currently, in most such efforts, the task of finding the right value for each variable – denoted as “coding” – is performed manually by human coders, and is extremely time consuming. Thus, the need for automation is enormous.

References

  1. 1.
    Albanese M, Subrahmanian VS (2007) T-REX: a system for automated cultural information extraction. In: Proceedings of the first international conference on computational cultural dynamics (ICCCD ’07). AAAI, Menlo Park, pp 2–8Google Scholar
  2. 2.
    Amitay E, Har’El N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 273–280Google Scholar
  3. 3.
    Callan J, Mitamura T (2002) Knowledge-based extraction of named entities. In: Proceedings of the 4th international conference on information and knowledge management. ACM, New YorkGoogle Scholar
  4. 4.
    Cowie J, Lehnert W (1996) Information extraction. Commun ACM 39(1):80–91Google Scholar
  5. 5.
    Cunningham H, Maynard D, Bontcheva K, Tablan V GATE: a framework and graphical development environment for robust nlp tools and applications. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (2002)Google Scholar
  6. 6.
    Ding Y, Embley DW (2006) Using data-extraction ontologies to foster automating semantic annotation. In: Proceedings of the 22nd international conference on data engineering workshops (ICDEW’06). IEEE Computer Society, Washington, DC, p 138Google Scholar
  7. 7.
    Gatterbauer W, Bohunsky P, Herzog M, Kroepl B, Pollak B (2007) Towards domain-independent information extraction from web tables. In: Proceedings of the 16th international world wide web conference. ACM, New York, pp 71–80Google Scholar
  8. 8.
    GuoDong Z, Jian S (2003) Integrating various features in hidden markov model using constraint relaxation algorithm for recognition of named entities without gazetteers. In: Proceedings of the international conference on natural language processing and knowledge engineering. IEEE Press, pp 465–470.Google Scholar
  9. 9.
    Jensen LJ, Saric J, Bork P (2006) Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 7(2):119–129Google Scholar
  10. 10.
    Levin B (1993) English verb classes and alternations: a preliminary investigation. University of Chicago Press, ChicagoGoogle Scholar
  11. 11.
    Sleator DD, Temperley D (1993) Parsing english with a link grammar. In: Proceedings of the third international workshop on parsing technologies (IWPT ’93). University of Tilburg, The NetherlandsGoogle Scholar
  12. 12.
    Soderland S (1997) Learning to extract text-based information from the world wide web. In: Proceedings of the 3rd international conference on knowledge discovery and data mining. AAAI Press, pp 251–254Google Scholar
  13. 13.
    World Wide Web Consortium (W3C) (2004) Resource description framework (RDF). http://www.w3.org/RDF/

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Massimiliano Albanese
    • 1
  • Marat Fayzullin
    • 2
  • Jana Shakarian
    • 2
  • V. S. Subrahmanian
    • 2
  1. 1.George Mason UniversityFairfaxUSA
  2. 2.University of MarylandCollege ParkUSA

Personalised recommendations