MCD 2007: Mining Complex Data pp 82-92 | Cite as

Discovering Word Meanings Based on Frequent Termsets

  • Henryk Rybinski
  • Marzena Kryszkiewicz
  • Grzegorz Protaziuk
  • Aleksandra Kontkiewicz
  • Katarzyna Marcinkowska
  • Alexandre Delteil
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4944)

Abstract

Word meaning ambiguity has always been an important problem in information retrieval and extraction, as well as, text mining (documents clustering and classification). Knowledge discovery tasks such as automatic ontology building and maintenance would also profit from simple and efficient methods for discovering word meanings. The paper presents a novel text mining approach to discovering word meanings. The offered measures of their context are expressed by means of frequent termsets. The presented methods have been implemented with efficient data mining techniques. The approach is domain- and language-independent, although it requires applying part of speech tagger. The paper includes sample results obtained with the presented methods.

Keywords

Association rules frequent termsets homonyms polysemy 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th Int’l Conf. on Very Large Databases, pp. 487–499. Morgan Kaufmann, Santiago (1994)Google Scholar
  2. 2.
    Dorow, B., Widdows, D.: Discovering corpus-specific word senses. In: EACL 2003, Budapest, Hungary, pp. 79–82 (2003)Google Scholar
  3. 3.
    FAOLEX Legal Database, FAO, http://faolex.fao.org/faolex
  4. 4.
    Gawrysiak, P., Rybinski, H., Skonieczny, Ł, Wiech, P.: AMI-SME: An exploratory approach to knowledge retrieval for SME’s. In: 3rd Int’l Conf. on Autonomic and Autonomous Systems, ICAS 2007 (2007)Google Scholar
  5. 5.
    General Architecture for Text Engineering, http://gate.ac.uk/projects.html
  6. 6.
    Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)Google Scholar
  7. 7.
    Hepple, M.: Independence and commitment: Assumptions for rapid training and execution of rule-based POS taggers. In: Proc. of the 38th Annual Meeting of the Association for Computational Linguistics, ACL 2000 (2000)Google Scholar
  8. 8.
    Ide, N., Veronis, J.: Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics 24(1), 1–40 (Special Issue on Word Sense Disambiguation)Google Scholar
  9. 9.
    Lin, D.: Automatic Retrieval and Clustering of Similar Words. In: Proc. of the 17th Int’l Conf. on Computational linguistics, Canada, vol. 2 (1998)Google Scholar
  10. 10.
    Mihalcea, R., Moldovan, D.: Automatic generation of a coarse grained WordNet. In: Proc. of NAACL Workshop on WordNet and Other Lexical Resources, Pittsburgh, PA (2001)Google Scholar
  11. 11.
    Miller, G., Chadorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a semantic concordance for sense identification. In: Proc. of the ARPA Human Language Technology Workshop, pp. 240–243 (1994)Google Scholar
  12. 12.
    Pantel, P., Lin, D.: Discovering word senses from text. In: Proc. of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, Edmonton, Alberta, Canada, July 23-26, 2002, pp. 613–619. ACM Press, New York (2002)CrossRefGoogle Scholar
  13. 13.
    Portnoy, D.: Unsupervised Discovery of the Multiple Senses of Words and Their Parts of Speech, The School of Engineering and Applied Science of The George Washington University, September 30 (2006)Google Scholar
  14. 14.
    Protaziuk, G., Kryszkiewicz, M., Rybinski, H., Delteil, A.: Discovering compound and proper nouns. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 505–515. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. 15.
    Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., Delteil, A.: Discovering synonyms based on frequent termsets. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 516–525. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    Sparck Jones, K.: Synonymy and Semantic Classification. Edinburgh University Press (1986) (originally published in 1964), ISBN 0-85224-517-3Google Scholar
  17. 17.
    Park, Y.C., Han, Y.S., Choi, K.-S.: Automatic thesaurus construction using bayesian networks. In: The Proc. of the fourth international conference on Information and knowledge management, United States (1995)Google Scholar
  18. 18.
    Zaki Mohammed, J., Karam, G.: Efficiently mining maximal frequent itemsets. In: 1st IEEE Int’l Conf. on Data Mining, San Jose (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Henryk Rybinski
    • 1
  • Marzena Kryszkiewicz
    • 1
  • Grzegorz Protaziuk
    • 1
  • Aleksandra Kontkiewicz
    • 1
  • Katarzyna Marcinkowska
    • 1
  • Alexandre Delteil
    • 2
  1. 1.ICS, Warsaw University of Technology 
  2. 2.France Telecome R & D 

Personalised recommendations