Semantic Concept Discovery over Event Databases

  • Oktie HassanzadehEmail author
  • Shari Trewin
  • Alfio Gliozzo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10843)


In this paper, we study the problem of identifying certain types of concept (e.g., persons, organizations, topics) for a given analysis question with the goal of assisting a human analyst in writing a deep analysis report. We consider a case where we have a large event database describing events and their associated news articles along with meta-data describing various event attributes such as people and organizations involved and the topic of the event. We describe the use of semantic technologies in question understanding and deep analysis of the event database, and show a detailed evaluation of our proposed concept discovery techniques using reports from Human Rights Watch organization and other sources. Our study finds that combining our neural network based semantic term embeddings over structured data with an index-based method can significantly outperform either method alone.


  1. 1.
    Annoy. Accessed 8 May 2017
  2. 2.
    Apache OpenNLP (v1.5.3). Accessed 18 May 2017
  3. 3.
    ClearNLP (v3.2.0). Accessed 18 May 2017
  4. 4.
    Riak. Accessed 8 May 2017
  5. 5.
    SolrCloud. Accessed 8 May 2017
  6. 6.
    TAC KBP 2016 Event Track. Accessed 8 May 2017
  7. 7.
    Word2vec: tool for computing continuous distributed representations of words. Accessed 8 May 2017
  8. 8.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. JWS 7(3), 154–165 (2009)CrossRefGoogle Scholar
  9. 9.
    Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)Google Scholar
  10. 10.
    Bordawekar, R., Shmueli, O.: Enabling Cognitive Intelligence Queries in Relational Databases using Low-dimensional Word Embeddings. CoRR abs/1603.07185 (2016).
  11. 11.
    Boschee, E., Lautenschlager, J., O’Brien, S., Shellman, S., Starz, J., Ward, M.: ICEWS Coded Event Data (2017).
  12. 12.
    Doddington, G., et al.: The automatic content extraction (ACE) program tasks, data, and evaluation. In: LREC, May 2004Google Scholar
  13. 13.
    Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Rec. 34(4), 27–33 (2005)CrossRefGoogle Scholar
  14. 14.
    Hogenboom, F., Frasincar, F., Kaymak, U., de Jong, F., Caron, E.: A survey of event extraction methods from text for decision support systems. Decis. Support Syst. 85(C), 12–22 (2016). Scholar
  15. 15.
    Korkmaz, G., Cadena, J., Kuhlman, C.J., Marathe, A., Vullikanti, A., Ramakrishnan, N.: Combining heterogeneous data sources for civil unrest forecasting. In: ASONAM, pp. 258–265 (2015).
  16. 16.
    Leban, G., Fortuna, B., Brank, J., Grobelnik, M.: Event registry: learning about world events from news. In: WWW, pp. 107–110 (2014)Google Scholar
  17. 17.
    Leetaru, K., Schrodt, P.A.: GDELT: global data on events, location, and tone, 1979–2012. In: ISA Annual Convention (2013)Google Scholar
  18. 18.
    Lin, D., Pantel, P.: Concept discovery from text. In: COLING, pp. 1–7 (2002)Google Scholar
  19. 19.
    Madhavan, J., Jeffery, S.R., Cohen, S., Dong, X., Ko, D., Yu, C., Halevy, A.: Web-scale data integration: you can only afford to pay as you go. In: CIDR (2007)Google Scholar
  20. 20.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  21. 21.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)Google Scholar
  22. 22.
    Muthiah, S., et al.: Embers at 4 years: Experiences operating an open source indicators forecasting system. In: KDD, pp. 205–214 (2016)Google Scholar
  23. 23.
    Rebele, T., Suchanek, F.M., Hoffart, J., Biega, J., Kuzey, E., Weikum, G.: YAGO: a multilingual knowledge base from Wikipedia, Wordnet, and Geonames. In: ISWC, pp. 177–185 (2016)Google Scholar
  24. 24.
    Schrodt, P.A., Yilmaz, O., Gerner, D.J., Hermreck, D.: The CAMEO (conflict and mediation event observations) actor coding framework. In: 2008 Annual Meeting of the International Studies Association (2008)Google Scholar
  25. 25.
    Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM, pp. 623–632 (2007)Google Scholar
  26. 26.
    Sohrabi, S., Riabov, A., Katz, M., Udrea, O.: An AI planning solution to scenario generation for enterprise risk management. In: AAAI (2018)Google Scholar
  27. 27.
    Sohrabi, S., Udrea, O., Riabov, A.V., Hassanzadeh, O.: Interactive planning-based hypothesis generation with LTS++. In: IJCAI, pp. 4268–4269 (2016)Google Scholar
  28. 28.
    Tan, A.H.: Text mining: the state of the art and the challenges. In. In Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases, pp. 65–70 (1999)Google Scholar
  29. 29.
    Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
  30. 30.
    Ward, M.D., Beger, A., Cutler, J., Dickenson, M., Dorff, C., Radford, B.: Comparing GDELT and ICEWS event data. Analysis 21, 267–297 (2013)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM ResearchYorktown HeightsUSA

Personalised recommendations