Text Mining at Detail Level Using Conceptual Graphs

  • Manuel Montes-y-Gómez
  • Alexander Gelbukh
  • Aurelio López-López
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2393)

Abstract

Text mining is defined as knowledge discovery in large text collections. It detects interesting patterns such as clusters, associations, deviations, similarities, and differences in sets of texts. Current text mining methods use simplistic representations of text contents, such as keyword vectors, which imply serious limitations on the kind and meaningfulness of possible discoveries. We show how to do some typical mining tasks using conceptual graphs as formal but meaningful representation of texts. Our methods involve qualitative and quantitative comparison of conceptual graphs, conceptual clustering, building a conceptual hierarchy, and application of data mining techniques to this hierarchy in order to detect interesting associations and deviations. Our experiments show that, despite widespread misbelief, detailed meaningful mining with conceptual graphs is computationally affordable.

Keywords

text mining conceptual graphs conceptual clustering association discovery deviation detection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., A. Arning, T. Bollinger, M. Mehta, J. Shafer, R. Srikant (1996). The Quest Data Mining System, Proc. of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon, August, 1996.Google Scholar
  2. 2.
    Arning Andreas, Rakesh Agrawal, and Prabhakar Raghavan (1996). A Linear Method for Deviation Detection in Large Databases, Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, 1996.Google Scholar
  3. 3.
    Barrière (1997). From a Children’s First Dictionary to a Lexical Knowledge Base of Conceptual Graphs. Ph.D. Thesis, Université Simon Fraser, 1997.Google Scholar
  4. 4.
    Boytcheva, Dobrev, and Angelova (2001), CGExtract: Towards Extraction of Conceptual Graphs from Controlled English. Lecture Notes in Computer Science 2120, Springer 2001.Google Scholar
  5. 5.
    Ciravegna et al., Ed. (2001), Proc. of the 17Th International Joint Conference on Artificial Intelligence (IJCAI-2001), Workshop of Adaptive Text Mining, Seattle, WA, 2001.Google Scholar
  6. 6.
    Fayyad, Usama M., Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy (1996), Advances in Knowledge Discovery and Data Mining, Cambridge, MA: MIT Press, 1996.Google Scholar
  7. 7.
    Feldman, Ed. (1999), Proc. of The 16th International Joint Conference on Artificial Intelligence (IJCAI-1999), Workshop on Text Mining: Foundations, Techniques and Applications, Stockholm, Sweden, 1999.Google Scholar
  8. 8.
    Feldman, R., M. Fresko, Y. Kinar, Y. Lindell, O. Liphstat, M. Rajman, Y. Schler, and O. Zamir (1998). Text Mining at the Term Level, Proc. of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD’98), Nantes, France, September 23-26, 1998.Google Scholar
  9. 9.
    Feldman and Hirsh (1996), Mining Associations in Text in the Presence of Background Knowledge, Proc. of the 2nd International Conference on Knowledge Discovery (KDD-96), Portland, 1996.Google Scholar
  10. 10.
    Han and Kamber (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001.Google Scholar
  11. 11.
    Hearst (1999), Untangling Text Data Mining, Proc. of ACL’99: The 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, June 20-26, 1999.Google Scholar
  12. 12.
    Kaufman and Williams (1990), Finding groups in data, John Wiley & Sons, New York, 1990.Google Scholar
  13. 13.
    Knorr and Ng (1998), Algorithms for Mining Distance-based Outliers in Large Datasets, Proc. of the International Conference on Very Large Data Bases (VLDB’98), Newport Beach, CA, 1998.Google Scholar
  14. 14.
    Lent, Brian, Rakesh Agrawal, and Ramakrishnan Srikant (1997). Discovering Trends in Text Databases, Proc. of the 3rd Int’l Conference on Knowledge Discovery in Databases and Data Mining, Newport Beach, California, August 1997.Google Scholar
  15. 15.
    Mannila, Heikki (1997). Methods and Problems in Data Mining, Proc. International Conference on Database Theory, Delphi, Greece, January 1997.Google Scholar
  16. 16.
    Mladenic, Ed. (2000), Proc. of the Sixth International Conference on Knowledge Discovery and Data Mining, Workshop on Text Mining, Boston, MA, 2000.Google Scholar
  17. 17.
    Montes-y-Gómez, Gelbukh, López-López (1999), Document intentions expressed in titles. Extraction, representation, and possible use, Selected Works 1997-1998, Instituto Politécnico Nacional, Centro de Investigación en Computación, 1999.Google Scholar
  18. 18.
    Montes-y-Gómez, Gelbukh, López-López, Baeza-Yates (2001a), Flexible Comparison of Conceptual Graphs. Lecture Notes in Computer Science 2113. Springer-Verlag, 2001.Google Scholar
  19. 19.
    Montes-y-Gómez, Gelbukh, López-López, Baeza-Yates (2001b), Un Método de Agrupamiento de Grafos Conceptuales para Minería de Texto, Procesamiento de Lenguaje Natural, Vol. 27, Septiembre 2001.Google Scholar
  20. 20.
    Montes-y-Gómez, Gelbukh, López-López (2001c), Discovering Association Rules in Semi-structured Data Sets, Proc. of the Workshop on Knowledge Discovery from Distributed, Dynamic, Heterogeneous, Autonomous Data and Knowledge Sources, International Joint Conference on Artificial Intelligence (IJCAI’2001), Seattle, WA, August 2001.Google Scholar
  21. 21.
    Montes-y-Gómez, Gelbukh, López-López (2001d), Detecting deviations in text collections: An approach using conceptual graphs, To appear in Lecture Notes in Artificial Intelligence 2313.Google Scholar
  22. 22.
    Mugnier (1995), On generalization / specialization for conceptual graphs, Journal of Experimental and Theoretical Artificial Intelligence, volume 7, pages 325–344, 1995.MATHCrossRefGoogle Scholar
  23. 23.
    Tan (1999), Text Mining: The state of the art and challenges, Proc. of the Workshop Knowledge Discovery from advanced Databases PAKDDD-99, Abril 1999.Google Scholar
  24. 24.
    Tapia-Melchor and López-López (1998), Automatic Information Extraction from Documents in WWW, Memorias del Séptimo Congreso Internacional de Electrónica, Comunicaciones y Computadoras, CONIELECOMP 98, Febrero, 1998.Google Scholar
  25. 25.
    Shian-Hua Lin, Chi-Sheng Shin, Meng Chang Chen, Jan-Ming Ho, Ming-Tat Ko, and Yueh-Ming Huang (1998). Extracting Classification Knowledge of Internet Documents with Mining Term Associations: A semantic Approach, Proceedings of SIGIR’98, Melbourne, Australia, 1998.Google Scholar
  26. 26.
    Sowa (1999), Conceptual Graphs: Draft Proposed American National Standard, International Conference on Conceptual Structures ICCS-99, Lecture Notes in Artificial Intelligence 1640, Springer 1999.Google Scholar
  27. 27.
    Sowa and Way (1986), Implementing a semantic interpreter using conceptual graphs, IBM Journal of Research and Development 30:1, January, 1986.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Manuel Montes-y-Gómez
    • 1
  • Alexander Gelbukh
    • 2
  • Aurelio López-López
    • 1
  1. 1.Óptica y Electrónica (INAOE)Instituto Nacional de AstrofísicaMexico
  2. 2.Centro de Investigación en Computación (CIC-IPN)Mexico

Personalised recommendations