Skip to main content

NLP Methodology as Guidance and Verification of the Data Mining of Survey ENSANUT 2012

  • Conference paper
  • First Online:
Advances in Artificial Intelligence and Its Applications (MICAI 2015)

Abstract

Data Mining represents the cutting edge when we think about extracting information; however it always implicates a considerable spent provided that it needs “structured data”. Following this idea, text mining appears in the horizon, as a little spent, reliable alternative. It is able to provide meaningful expert information without the availability of plenty of resources, all we need is a fair big (real big) corpus of text in order to conduct a research on almost every topic. By themselves, both approaches provide valuable information at the end, nevertheless what would happen if both processes were linked in a way that one approach’s results could be verify by the result of a second process? With this idea on mind we are relaying on one hypothesis this is possible to generate a bound between both mining process and using them back and forth to verify one another. Hence, we describe thoroughly both methodologies making a special emphasis on mentioning those phases which have a propensity to establish a strong bound between them. We found that bound in the fact that once a Natural Language Processing has been performed on the chosen corpora what we got as an output is a list of meaningful nouns which can be used as features that will guide in a reliable way a data mining process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://ensanut.insp.mx/.

  2. 2.

    http://www.sketchengine.co.uk.

  3. 3.

    http://nlp.stanford.edu/software/corenlp.shtml [46].

  4. 4.

    http://www.sketchengine.co.uk/ [7].

  5. 5.

    http://www.oracle.com/technetwork/es/java/javase/downloads/index.html.

  6. 6.

    http://www.inegi.org.mx/.

  7. 7.

    http://ensanut.insp.mx/.

References

  1. Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining, pp. 211–234. The MIT Press, Cambridge (2001)

    Google Scholar 

  2. Orallo, J.H., Quintana, M.J.R., Ramírez, C.F.: Introducción a la minería de datos, pp. 257–278. Universidad Politécnica de Valencia, Pearson, Prentice Hall, Madrid (2004)

    Google Scholar 

  3. Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.J.: Text Mining, Predictive Methods for Analyzing Unstructured Information, pp. 47-82–85-101. Springer, United States of America (2010)

    Google Scholar 

  4. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  5. Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63–70 (2000)

    Google Scholar 

  6. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL 2003, pp. 252–259 (2003)

    Google Scholar 

  7. Kilgarriff, A., et al.: The sketch engine: ten years on. Lexicography 1, 1–30 (2014)

    Article  Google Scholar 

  8. Orallo, J.H., Quintana, J.R., Ferri, C.: Introducción a la Minería de Datos, pp. 260–271. Pearson Prentice Hall, Englewood Cliffs (2004)

    Google Scholar 

Download references

Acknowledgments

VMCV is grateful to CONACYT (Consejo Nacional de Ciencia y Tecnología) for support, to Dr. César Cruz, for his support and advice during the planning phase of the NLP methodology and to various classmates for useful discussions. CRS is grateful for financial support from PAPIIT project IN113414. This work was partially funded by the C3 – Centro de Ciencias de la Complejidad.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Víctor Manuel Corza Vargas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Vargas, V.M.C., Stephens, C.R., Martínez, G.E.S., Rendón, A.M. (2015). NLP Methodology as Guidance and Verification of the Data Mining of Survey ENSANUT 2012. In: Pichardo Lagunas, O., Herrera Alcántara, O., Arroyo Figueroa, G. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2015. Lecture Notes in Computer Science(), vol 9414. Springer, Cham. https://doi.org/10.1007/978-3-319-27101-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27101-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27100-2

  • Online ISBN: 978-3-319-27101-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics