Detecting Deviations in Text Collections: An Approach Using Conceptual Graphs

Montes-y-Gómez, M.; Gelbukh, A.; López-López, A.

doi:10.1007/3-540-46016-0_19

Detecting Deviations in Text Collections: An Approach Using Conceptual Graphs

M. Montes-y-Gómez^5,6,
A. Gelbukh⁵ &
A. López-López⁶

Conference paper
First Online: 01 January 2002

649 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2313))

Abstract

Deviation detection is an important problem of both data and text mining. In this paper we consider the detection of deviations in a set of texts represented as conceptual graphs. In contrast with statistical and distance-based approaches, the method we propose is based on the concept of generalization and regularity. Among its main characteristics are the detection of rare patterns (that attempt to give a generalized description of rare texts) and the ability to discover local deviations (deviations at different contexts and generalization levels). The method is illustrated with the analysis of a set of computer science papers.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alexandrov, M., A. Gelbukh, and P. Makagonov (2000), On Metrics for Keyword-Based Document Selection and Classification, Proc. of the Conference on Intelligent Text Processing and Computational Linguistics CICLing-2000, Mexico City, Mexico, February 2000.
Google Scholar
Allan, Papka and Lavrenko (1998), On-line new Event Detection and Tracking, Proc. of the 21st ACM-SIGIR International Conference on Research and Developement in Information Retrieval, August 1998.
Google Scholar
Arning, Agrawal and Raghavan (1996), A Linear Method for Deviation Detection in Large Databases, Proc. of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, 1996.
Google Scholar
Barnett and Lewis (1994), Outliers in Statistical Data, New York: John Wiley & Sons, 1994.
MATH Google Scholar
Ciravegna et al., Ed. (2001), Proc. of the 17Th International Joint Conference on Artificial Intelligence (IJCAI-2001), Workshop of Adaptive Text Mining, Seattle, WA, 2001.
Google Scholar
Feldman and Dagan (1995), Knowledge Discovery in Textual databases (KDT), Proc. of the 1st International Conference on Knowledge discovery (KDD_95), pp.112–117, Montreal, 1995.
Google Scholar
Feldman, Ed. (1999), Proc. of The 16th International Joint Conference on Artificial Intelligence (IJCAI-1999), Workshop on Text Mining: Foundations, Techniques and Applications, Stockholm, Sweden, 1999.
Google Scholar
Guzmán (1996), Uso y Diseño de Mineros de Datos, J. Soluciones Avanzadas, Num. 34, 1996.
Google Scholar
Han and Kamber (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001.
Google Scholar
Knorr and Ng (1998), Algorithms for Mining Distance-based Outliers in Large Datasets, Proc. of the International Conference on Very Large Data Bases (VLDB’98), Newport Beach, CA, 1997.
Google Scholar
Mladenic, Ed. (2000), Proc. of the Sixth International Conference on Knowledge Discovery and Data Mining, Workshop on Text Mining, Boston, MA, 2000.
Google Scholar
Montes-y-Gómez, Gelbukh, López-López (1999), Document intentions expressed in titles. Extraction, representation, and possible use, Selected Works 1997-1998, Center for Computing Research (CIC-IPN), 1999.
Google Scholar
Montes-y-Gómez, Gelbukh, López-López, Baeza-Yates (2001), Un Método de Agrupamiento de Grafos Conceptuales para Minería de Texto, J. Procesamiento de Lenguaje Natural, Vol. 27, Septiembre 2001.
Google Scholar
Montes-y-Gómez (2002), Minería de texto usando la semejanza entre estructuras semánticas, Ph.D. thesis, Center for Computing Research (CIC-IPN), Mexico, 2002.
Google Scholar
Tapia-Melchor and López-López (1998), Automatic Information Extraction from Documents in WWW, Séptimo Congreso Internacional de Electrónica, Comunicaciones y Computadoras, CONIELECOMP 98, Febrero, 1998.
Google Scholar
Sowa (1999), Knowledge Representation: Logical, Philosophical and Computational Foundations, 1st edition, Thomson Learning, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computing Research (CIC-IPN), México
M. Montes-y-Gómez & A. Gelbukh
Instituto Nacional de Astrofísica, Optica y Electrónica (INAOE), México
M. Montes-y-Gómez & A. López-López

Authors

M. Montes-y-Gómez
View author publications
You can also search for this author in PubMed Google Scholar
A. Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar
A. López-López
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Section, Electrical Engineering Department, CINVESTAV-IPN, Av. IPN 2508, Col. San Pedro Zacatenco, D.F. 07300, Mexico, Mexico
Carlos A. Coello Coello
Computer Science Department, ITESM-Mexico City, Calle del Puente 222, Tlalpan, D.F. 14380, Mexico, Mexico
Alvaro de Albornoz
Computer Science Department, ITESM-Cuernavaca, Reforma 182-A, Lomas de Cuernavaca, Temixco, 62589, Morelos, Mexico
Luis Enrique Sucar
Department of Computer Science, ITAM, Rio Hondo 1, Progreso Tizapan, D.F. 01000, Mexico, Mexico
Osvaldo Cairó Battistutti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Montes-y-Gómez, M., Gelbukh, A., López-López, A. (2002). Detecting Deviations in Text Collections: An Approach Using Conceptual Graphs. In: Coello Coello, C.A., de Albornoz, A., Sucar, L.E., Battistutti, O.C. (eds) MICAI 2002: Advances in Artificial Intelligence. MICAI 2002. Lecture Notes in Computer Science(), vol 2313. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46016-0_19

Download citation

DOI: https://doi.org/10.1007/3-540-46016-0_19
Published: 07 May 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43475-7
Online ISBN: 978-3-540-46016-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics