Abstract
Text data has been growing dramatically in the last years, mainly due to the advance of web related technologies that enable people to produce an overwhelming amount of data. Many knowledge about the world is encoded in text data available through blogs, tweets, web pages, articles, and books.
This paper introduces some general techniques for text data mining, based on text retrieval models, that can be applicable to any text in any natural language. The techniques are targeted to problems requiring minimum or no human effort. These techniques, which can be used in many applications, allow the measurement of similarity of contexts, as well as the co-occurrence of terms in text data with different levels of granularity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Berry, M.W.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, New York (2004)
Berry, M.W., Castellanos, M.: Survey of Text Mining II: Clustering, Classification, and Retrieval. Springer, New York (2008)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New Jersey (2012)
Hotho, A., Nrnberger, A., Paa, G.: A brief survey of text mining. LDV Forum - GLDV J. Comput. Linguist. Lang. Technol. 20(1), 19–62 (2005)
Inzalkar, S., Sharma, J.: A survey on text mining-techniques and application. Int. J. Res. Sci. Eng. Techno-Xtreme 16, 488–495 (2015)
Jiang, S., Zhai, C.: Random walks on adjacency graphs for mining lexical relations from big text data. In: Proceedings of IEEE International Conference on Big Data 2014. https://doi.org/10.1109/BigData.2014.7004272 (2014)
Lu, Y., Mei, Q., Zhai, C.: Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inf. Retr. 14(2), 178–203 (2011)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Miner, G.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Academic, New York (2012)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
Patel, M.R., Sharma, M.G.: A survey on text mining techniques. Int. J. Eng. Comput. Sci. 3(5), 5621–5625 (2014)
Tated, R.R., Ghonge, M.M.: A survey on text mining-techniques and application. Int. J. Res. Advent Technol. ICATEST2015, 380–385 (2015)
Zhai, C., Massung, S.: Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. Morgan & Claypool, Williston (2016)
Zhai, C.: Text mining and analytics. Available from https://www.coursera.org/learn/text-mining. Cited 25 May 2016 (2016)
Acknowledgements
This work was supported by Portuguese funds through the Center of Naval Research (CINAV), Portuguese Naval Academy, Portugal and The Portuguese Foundation for Science and Technology (FCT), through the Center for Computational and Stochastic Mathematics (CEMAT), University of Lisbon, Portugal, project UID/Multi/04621/2013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Correia, A., Teodoro, M.F., Lobo, V. (2018). Statistical Methods for Word Association in Text Mining. In: Oliveira, T., Kitsos, C., Oliveira, A., Grilo, L. (eds) Recent Studies on Risk Analysis and Statistical Modeling. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-76605-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-76605-8_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76604-1
Online ISBN: 978-3-319-76605-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)