Skip to main content

Statistical Methods for Word Association in Text Mining

  • Chapter
  • First Online:
Recent Studies on Risk Analysis and Statistical Modeling

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Abstract

Text data has been growing dramatically in the last years, mainly due to the advance of web related technologies that enable people to produce an overwhelming amount of data. Many knowledge about the world is encoded in text data available through blogs, tweets, web pages, articles, and books.

This paper introduces some general techniques for text data mining, based on text retrieval models, that can be applicable to any text in any natural language. The techniques are targeted to problems requiring minimum or no human effort. These techniques, which can be used in many applications, allow the measurement of similarity of contexts, as well as the co-occurrence of terms in text data with different levels of granularity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Berry, M.W.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, New York (2004)

    Book  Google Scholar 

  2. Berry, M.W., Castellanos, M.: Survey of Text Mining II: Clustering, Classification, and Retrieval. Springer, New York (2008)

    Book  Google Scholar 

  3. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  4. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New Jersey (2012)

    MATH  Google Scholar 

  5. Hotho, A., Nrnberger, A., Paa, G.: A brief survey of text mining. LDV Forum - GLDV J. Comput. Linguist. Lang. Technol. 20(1), 19–62 (2005)

    Google Scholar 

  6. Inzalkar, S., Sharma, J.: A survey on text mining-techniques and application. Int. J. Res. Sci. Eng. Techno-Xtreme 16, 488–495 (2015)

    Google Scholar 

  7. Jiang, S., Zhai, C.: Random walks on adjacency graphs for mining lexical relations from big text data. In: Proceedings of IEEE International Conference on Big Data 2014. https://doi.org/10.1109/BigData.2014.7004272 (2014)

  8. Lu, Y., Mei, Q., Zhai, C.: Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inf. Retr. 14(2), 178–203 (2011)

    Article  Google Scholar 

  9. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)

    MATH  Google Scholar 

  10. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  11. Miner, G.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Academic, New York (2012)

    Google Scholar 

  12. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  13. Patel, M.R., Sharma, M.G.: A survey on text mining techniques. Int. J. Eng. Comput. Sci. 3(5), 5621–5625 (2014)

    Google Scholar 

  14. Tated, R.R., Ghonge, M.M.: A survey on text mining-techniques and application. Int. J. Res. Advent Technol. ICATEST2015, 380–385 (2015)

    Google Scholar 

  15. Zhai, C., Massung, S.: Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. Morgan & Claypool, Williston (2016)

    Google Scholar 

  16. Zhai, C.: Text mining and analytics. Available from https://www.coursera.org/learn/text-mining. Cited 25 May 2016 (2016)

Download references

Acknowledgements

This work was supported by Portuguese funds through the Center of Naval Research (CINAV), Portuguese Naval Academy, Portugal and The Portuguese Foundation for Science and Technology (FCT), through the Center for Computational and Stochastic Mathematics (CEMAT), University of Lisbon, Portugal, project UID/Multi/04621/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anacleto Correia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Correia, A., Teodoro, M.F., Lobo, V. (2018). Statistical Methods for Word Association in Text Mining. In: Oliveira, T., Kitsos, C., Oliveira, A., Grilo, L. (eds) Recent Studies on Risk Analysis and Statistical Modeling. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-76605-8_27

Download citation

Publish with us

Policies and ethics