Abstract
Detection of discriminant terms allow us to improve the performance of natural language processing systems. The goal is to be able to find the possible term contribution in a given corpus and, thereafter, to use the terms of high contribution for representing the corpus. In this paper we present various experiments that use elliptic curves with the purpose of discovering discriminant terms of a given textual corpus. Different experiments led us to use the mean and variance of the corpus terms for determining the parameters of a Weierstrass reduced equation (elliptic curve). We use the elliptic curves in order to graphically visualize the behavior of the corpus vocabulary. Thereafter, we use the elliptic curve parameters in order to cluster those terms that share characteristics. These clusters are then used as discriminant terms in order to represent the original document collection. Finally, we evaluated all these corpus representations in order to determine those terms that best discrimine each document.
This work has been partially supported by the projects: CONACYT #106625, VIEP #VIAD-ING11-I, #PIAD-ING11-I, #BEMB-ING11-I, as well as by the PROMEP/103.5/09/4213 grant.
Chapter PDF
Similar content being viewed by others
Keywords
- Elliptic Curve
- Elliptic Curf
- North American Free Trade Agreement
- Textual Corpus
- Elliptic Curve Cryptography
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Can, F., Ozkarahan, E.A.: Computation of term/document discrimination values by use of the cover coefficient concept. Journal of the American Society for Information Science 38(3), 171–183 (1987)
Manning, D.C., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)
Pinto, D.: On Clustering and Evaluation of Narrow Domain Short-Text Corpora. Phd thesis, Department of Information Systems and Computation, UPV (2008)
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Montemurro, M.A., Zanette, D.H.: Entropic analysis of the role of words in literary texts. Advances in Complex Systems (ACS) 05(01), 7–17 (2002)
Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulchloper, J.: Topic discovery based on text mining techniques. Information Processing and Management 43(3), 752–768 (2007)
Santiesteban, Y., Pons-Porrata, A.: LEX: a new algorithm for the calculus of typical testors. Mathematics Sciences Journal 21(1), 85–95 (2003)
Hankerson, D., Menezes, A.J., Vanstone, S.: Guide to Elliptic Curve Cryptography. Springer, New York (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vilariño, D., Pinto, D., Balderas, C., Tovar, M., Beltrán, B., Paniagua, S. (2011). Use of Elliptic Curves in Term Discrimination. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Ben-Youssef Brants, C., Hancock, E.R. (eds) Pattern Recognition. MCPR 2011. Lecture Notes in Computer Science, vol 6718. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21587-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-21587-2_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21586-5
Online ISBN: 978-3-642-21587-2
eBook Packages: Computer ScienceComputer Science (R0)