NLP Methodology as Guidance and Verification of the Data Mining of Survey ENSANUT 2012

Vargas, Víctor Manuel Corza; Stephens, Christopher R.; Martínez, Gerardo Eugenio Sierra; Rendón, Azucena Montes

doi:10.1007/978-3-319-27101-9_10

Víctor Manuel Corza Vargas¹⁶,
Christopher R. Stephens^17,18,
Gerardo Eugenio Sierra Martínez¹⁹ &
…
Azucena Montes Rendón¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9414))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

Abstract

Data Mining represents the cutting edge when we think about extracting information; however it always implicates a considerable spent provided that it needs “structured data”. Following this idea, text mining appears in the horizon, as a little spent, reliable alternative. It is able to provide meaningful expert information without the availability of plenty of resources, all we need is a fair big (real big) corpus of text in order to conduct a research on almost every topic. By themselves, both approaches provide valuable information at the end, nevertheless what would happen if both processes were linked in a way that one approach’s results could be verify by the result of a second process? With this idea on mind we are relaying on one hypothesis this is possible to generate a bound between both mining process and using them back and forth to verify one another. Hence, we describe thoroughly both methodologies making a special emphasis on mentioning those phases which have a propensity to establish a strong bound between them. We found that bound in the fact that once a Natural Language Processing has been performed on the chosen corpora what we got as an output is a list of meaningful nouns which can be used as features that will guide in a reliable way a data mining process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining, pp. 211–234. The MIT Press, Cambridge (2001)
Google Scholar
Orallo, J.H., Quintana, M.J.R., Ramírez, C.F.: Introducción a la minería de datos, pp. 257–278. Universidad Politécnica de Valencia, Pearson, Prentice Hall, Madrid (2004)
Google Scholar
Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.J.: Text Mining, Predictive Methods for Analyzing Unstructured Information, pp. 47-82–85-101. Springer, United States of America (2010)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63–70 (2000)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)
Google Scholar
Kilgarriff, A., et al.: The sketch engine: ten years on. Lexicography 1, 1–30 (2014)
Article Google Scholar
Orallo, J.H., Quintana, J.R., Ferri, C.: Introducción a la Minería de Datos, pp. 260–271. Pearson Prentice Hall, Englewood Cliffs (2004)
Google Scholar

Download references

Acknowledgments

VMCV is grateful to CONACYT (Consejo Nacional de Ciencia y Tecnología) for support, to Dr. César Cruz, for his support and advice during the planning phase of the NLP methodology and to various classmates for useful discussions. CRS is grateful for financial support from PAPIIT project IN113414. This work was partially funded by the C3 – Centro de Ciencias de la Complejidad.

Author information

Authors and Affiliations

Posgrado en Ciencia e Ingeniería de la Computación, UNAM, Mexico D.F., Mexico
Víctor Manuel Corza Vargas
C3 – Centro de Ciencias de la Complejidad, UNAM, Mexico D.F., Mexico
Christopher R. Stephens
Instituto de Ciencias Nucleares, UNAM, Mexico D.F., Mexico
Christopher R. Stephens
Instituto de Ingeniería, UNAM, Mexico D.F., Mexico
Gerardo Eugenio Sierra Martínez & Azucena Montes Rendón

Authors

Víctor Manuel Corza Vargas
View author publications
You can also search for this author in PubMed Google Scholar
Christopher R. Stephens
View author publications
You can also search for this author in PubMed Google Scholar
Gerardo Eugenio Sierra Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Azucena Montes Rendón
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Víctor Manuel Corza Vargas .

Editor information

Editors and Affiliations

Unidad Profesional Interdisciplinaria, México DF, Mexico
Obdulia Pichardo Lagunas
Universidad Autónoma Metropolitana, México DF, Mexico
Oscar Herrera Alcántara
Instituto de Investigaciones Eléctricas, Cuernavaca, Morelos, Mexico
Gustavo Arroyo Figueroa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vargas, V.M.C., Stephens, C.R., Martínez, G.E.S., Rendón, A.M. (2015). NLP Methodology as Guidance and Verification of the Data Mining of Survey ENSANUT 2012. In: Pichardo Lagunas, O., Herrera Alcántara, O., Arroyo Figueroa, G. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2015. Lecture Notes in Computer Science(), vol 9414. Springer, Cham. https://doi.org/10.1007/978-3-319-27101-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-27101-9_10
Published: 10 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27100-2
Online ISBN: 978-3-319-27101-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics