Abstract
Data Mining represents the cutting edge when we think about extracting information; however it always implicates a considerable spent provided that it needs “structured data”. Following this idea, text mining appears in the horizon, as a little spent, reliable alternative. It is able to provide meaningful expert information without the availability of plenty of resources, all we need is a fair big (real big) corpus of text in order to conduct a research on almost every topic. By themselves, both approaches provide valuable information at the end, nevertheless what would happen if both processes were linked in a way that one approach’s results could be verify by the result of a second process? With this idea on mind we are relaying on one hypothesis this is possible to generate a bound between both mining process and using them back and forth to verify one another. Hence, we describe thoroughly both methodologies making a special emphasis on mentioning those phases which have a propensity to establish a strong bound between them. We found that bound in the fact that once a Natural Language Processing has been performed on the chosen corpora what we got as an output is a list of meaningful nouns which can be used as features that will guide in a reliable way a data mining process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining, pp. 211–234. The MIT Press, Cambridge (2001)
Orallo, J.H., Quintana, M.J.R., Ramírez, C.F.: Introducción a la minería de datos, pp. 257–278. Universidad Politécnica de Valencia, Pearson, Prentice Hall, Madrid (2004)
Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.J.: Text Mining, Predictive Methods for Analyzing Unstructured Information, pp. 47-82–85-101. Springer, United States of America (2010)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63–70 (2000)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)
Kilgarriff, A., et al.: The sketch engine: ten years on. Lexicography 1, 1–30 (2014)
Orallo, J.H., Quintana, J.R., Ferri, C.: Introducción a la Minería de Datos, pp. 260–271. Pearson Prentice Hall, Englewood Cliffs (2004)
Acknowledgments
VMCV is grateful to CONACYT (Consejo Nacional de Ciencia y Tecnología) for support, to Dr. César Cruz, for his support and advice during the planning phase of the NLP methodology and to various classmates for useful discussions. CRS is grateful for financial support from PAPIIT project IN113414. This work was partially funded by the C3 – Centro de Ciencias de la Complejidad.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Vargas, V.M.C., Stephens, C.R., Martínez, G.E.S., Rendón, A.M. (2015). NLP Methodology as Guidance and Verification of the Data Mining of Survey ENSANUT 2012. In: Pichardo Lagunas, O., Herrera Alcántara, O., Arroyo Figueroa, G. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2015. Lecture Notes in Computer Science(), vol 9414. Springer, Cham. https://doi.org/10.1007/978-3-319-27101-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-27101-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27100-2
Online ISBN: 978-3-319-27101-9
eBook Packages: Computer ScienceComputer Science (R0)