Skip to main content

TermWatch II: Unsupervised Terminology Graph Extraction and Decomposition

  • Conference paper
Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 348))

  • 1155 Accesses

Abstract

We present a symbolic and graph-based approach for mapping knowledge domains. The symbolic component relies on shallow linguistic processing of texts to extract multi-word terms and cluster them based on lexico-syntactic relations. The clusters are subjected to graph decomposition based on inherent graph theoretic properties of association graphs of items (multi-word terms and authors). This includes the search for complete minimal separators that can decompose the graphs into central (core topics) and peripheral atoms. The methodology is implemented in the TermWatch system and can be used for several text mining tasks. In this paper, we apply our methodology to map the dynamics of terrorism research between 1990-2006. We also mined for frequent itemsets as a mean of revealing dependencies between formal concepts in the corpus. A comparison of the extracted frequent itemsets and the structure of the central atom shows an interesting overlap. The main features of our approach lie in the combination of state-of-the-art techniques from Natural Language Processing (NLP), Clustering and Graph Theory to develop a system and a methodology adapted to uncovering hidden sub-structures from texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 40(3), 211–218 (2006)

    Article  Google Scholar 

  2. Zitt, M., Bassecoulard, E.: Development of a method for detection and trend analysis of research fronts built by lexical or co-citation analysis. Scientometrics 30(1), 333–351 (1994)

    Article  Google Scholar 

  3. Glenisson, P., Glänzel, W., Janssens, F., De Moor, B.: Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management 41(6), 1548–1572 (2005)

    Article  Google Scholar 

  4. Harris, Z.S.: Mathematical Structures of Language. Wiley, New York (1968)

    MATH  Google Scholar 

  5. Grefenstette, G.: Sqlet: Short query linguistic expansion techniques, palliating one-word queries by providing intermediate structure to text. In: Proceedings of Recherche d’Information Assiste par Ordinateur (RIAO), pp. 500–509 (1997)

    Google Scholar 

  6. Watcholder, N., Evans, D., Klavans, J.: Automatic identification of index terms for interactive browsing. In: Proceedings of the ACM IEEE Joint Conference on Digital Libraries, Roanoke, Virginia, pp. 116–124 (2001)

    Google Scholar 

  7. Ibekwe-SanJuan, F.: Terminological variation, a means of identifying research topics from texts. In: Proc. of Joint ACL-COLING 1998, Québec, Canada, August 10-14, pp. 564–570 (1998)

    Google Scholar 

  8. Ibekwe-SanJuan, F., SanJuan, E.: From term variants to research topics. Journal of Knowledge Organization (ISKO), Special Issue on Human Language Technology 29(3/4) (2003)

    Google Scholar 

  9. Ibekwe-SanJuan, F., SanJuan, E.: Mining textual data through term variant clustering: the termwatch system. In: Proc. of Recherche d’Information assistée par ordinateur (RIAO), Avignon, France, pp. 26–28 (April 2004)

    Google Scholar 

  10. SanJuan, E., Ibekwe-SanJuan, F.: Text mining without document context. Information Processing and Management 42, 1532–1552 (2006)

    Article  Google Scholar 

  11. Ibekwe-SanJuan, F., Dubois, C.: Can syntactic variations highlight semantic links between domain topics? In: Proc. of the 6th International Conference on Terminology (TKE), Nancy, France, pp. 57–63 (August 2002)

    Google Scholar 

  12. Sanjuan, E., Dowdall, J., Ibekwe-Sanjuan, F., Rinaldi, F.: A symbolic approach to automatic multiword term structering. Computer Speech Language (CSL) 19(4), 524–542 (2005)

    Article  Google Scholar 

  13. Chen, C., Ibekwe-SanJuan, F., SanJuan, E., Weaver, C.: Visual analysis of conflicting opinions. In: 1st International IEEE Symposium on Visual Analytics Science and Technology (VAST 2006), Baltimore - Maryland, USA, pp. 59–66 (2006)

    Google Scholar 

  14. Didi Biha, M., Kaba, B., Meurs, M.-J., SanJuan, E.: Graph Decomposition Approaches for Terminology Graphs. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 883–893. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Ibekwe-Sanjuan, F., SanJuan, E., Vogeley, M.S.E.: Decomposition of terminology graphs for domain knowledge acquisition. In: Shanahan, J.G., Amer-Yahia, S., Manolescu, I., Zhang, Y., Evans, D.A., Kolcz, A., Choi, K.S., Chowdhury, A. (eds.) CIKM, pp. 1463–1464. ACM (2008)

    Google Scholar 

  16. Chen, C.: Citespace ii: Detecting and visualizing emerging trends and transient patterns in scientific literature. JASIS 57(3), 359–377 (2006)

    Article  Google Scholar 

  17. Chen, H., Wingyan, C., Qin, J., Reid, E., Sageman, M.: Uncovering the dark web: A case study of jihad on the web. Journal of the American Society for Information Science, JASIS 59(8), 1347–1359 (2008)

    Article  Google Scholar 

  18. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK, vol. 12 (1994)

    Google Scholar 

  19. Miller, G.A.: Wordnet: A Lexical Database for English. In: HLT. Morgan Kaufmann (1994)

    Google Scholar 

  20. Ibekwe-SanJuan, F.: A linguistic and mathematical method for mapping thematic trends from texts. In: Proc. of the 13th European Conference on Artificial Intelligence (ECAI), Brighton, UK, pp. 170–174 (August 1998)

    Google Scholar 

  21. Ferrer i Cancho, R., Solé, R.V.: The small world of human language. Proceedings of The Royal Society of London. Series B, Biological Sciences 268, 2261–2266 (2001)

    Article  Google Scholar 

  22. Agrawal, R., Imielińskivand, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Rec. 22, 207–216 (1993)

    Article  Google Scholar 

  23. Ganter, B., Stumme, G., Wille, R. (eds.): Formal Concept Analysis. LNCS (LNAI), vol. 3626. Springer, Heidelberg (2005)

    Google Scholar 

  24. Zaki, M.J.: Closed itemset mining and non-redundant association rule mining. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 365–368. Springer US (2009)

    Google Scholar 

  25. Berry, A., Pogorelcnik, R., Simonet, G.: An introduction to clique minimal separator decomposition. Algorithms 3(2), 197–215 (2010)

    Article  MathSciNet  Google Scholar 

  26. Fruchterman, T.M.J., Reingold, E.M.: Graph drawing by force-directed placement. Software: Practice and Experience 21(11), 1129–1164 (1991)

    Article  Google Scholar 

  27. Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

SanJuan, E. (2013). TermWatch II: Unsupervised Terminology Graph Extraction and Decomposition. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37186-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37185-1

  • Online ISBN: 978-3-642-37186-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics