Skip to main content

Natural Language Processing Methods Used for Automatic Prediction Mechanism of Related Phenomenon

  • Conference paper
Artificial Intelligence and Soft Computing (ICAISC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9120))

Included in the following conference series:

Abstract

The paper presents an idea to combine variety of Natural Language Processing techniques with different classification methods as a tool for automatic prediction mechanism of related phenomenon. Different types of preprocessing techniques are used and verified, in order to find the best set of them. It is assumed that such approach allows to recognize the phenomenon which is related to the text. Research uses the real input from the big data systems. The news website articles are the source of raw text data. The paper proposes the new, promising ways of automatic data and content mining methods for the big data systems. The presented accuracy results are much better than average classification for sentimental analysis done by the human.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Zhai, C.X.: Mining Text Data, pp. 12–14. Springer US (2012)

    Google Scholar 

  2. Chandrasekar, R., Srinivas, B.: Automatic induction of rules for text simplification. University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-96-30 (1996)

    Google Scholar 

  3. Colas, F., Brazdil, P.: Comparison of svm and some older classification algorithms in text classification tasks. In: Bramer, M. (ed.) Artificial Intelligence in Theory and Practice. IFIP, vol. 217, pp. 169–178. Springer, Boston (2006)

    Google Scholar 

  4. Definition of word lammatize (2014), http://www.thefreedictionary.com/lemmatise

  5. Esuli, A., Baccianella, S., Sebastiani, F.: Sentiwordnet3.0: An enhanced lexical resource for sentiment analysis and opinion mining (2010)

    Google Scholar 

  6. Frank, E., Witten, I.H., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)

    Google Scholar 

  7. Gabrilovich, E., Markovitch, S.: Text categorization with many redundant features: Using aggressive feature selection to make svms competitive with c4.5. In: ICML 2004, pp. 321–328 (2004)

    Google Scholar 

  8. Kao, A., Poteet, S.R.: Natural Language Processing and Text Mining, p. 12. Springer, London (2007)

    Google Scholar 

  9. Beigman Klebanov, B., Knight, K., Marcu, D.: Text simplification for information-seeking applications. In: Meersman, R., Tari, Z. (eds.) OTM 2004. LNCS, vol. 3290, pp. 735–747. Springer, Heidelberg (2004)

    Google Scholar 

  10. Konchady, M.: Text Mining Application Programming. Cengage Learning (2006)

    Google Scholar 

  11. Liu, H., Christiansen, T.: Biolemmatizer: A lemmatization tool for morphological processing of biomedical text. Journal of Biomedical Semantics 2012 (2012)

    Google Scholar 

  12. Martin, J., Jurafsky, D.: Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition, 2nd edn. Prentice Hall. (2008)

    Google Scholar 

  13. Miner, G.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, 1st edn. Academic Press (2012)

    Google Scholar 

  14. Nltk tokenization methods (2014), https://nltk.googlecode.com/svn/trunk/doc/howto/tokenize.html

  15. Pang, B., Lee, L.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)

    Google Scholar 

  16. Pimienta, D., Prado, D., Blanco, A.: Twelve years of measuring linguistic diversity in the internet. UNESCO (2009)

    Google Scholar 

  17. Sober, M.M., Soria, O.E., Guerrero, J.D.M.: Information Science Reference. In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, ch. 14, pp. 302–324 (2009)

    Google Scholar 

  18. Strapparava, C., Valitutti, A.: Wordnet-affect: an affective extension of wordnet. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)

    Google Scholar 

  19. Cha, S.-H., Ahmed, B., Charles, T.: Language identification from text using n-gram based cumulative frequency addition. Proceedings of Student/Faculty Research Day, CSIS, Pace University (2004)

    Google Scholar 

  20. Q-Success. Usage of content languages for websites (2014)

    Google Scholar 

  21. Vatanen, T., Vyrynen, J.J., Virpioja, S.: Language identification of short text segments with n-gram models. LREC (2010)

    Google Scholar 

  22. Wordnet (2014), http://wordnetweb.princeton.edu

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krystian Horecki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Horecki, K., Mazurkiewicz, J. (2015). Natural Language Processing Methods Used for Automatic Prediction Mechanism of Related Phenomenon. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2015. Lecture Notes in Computer Science(), vol 9120. Springer, Cham. https://doi.org/10.1007/978-3-319-19369-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19369-4_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19368-7

  • Online ISBN: 978-3-319-19369-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics