Abstract
Extracting key information from texts is a goal of natural language processing (NLP) field. A few keywords could prompt the main idea of the text, and the complete vocabulary information is richer, but not easy to organize. This paper proposes a word representation method based on frequently co-occurring entropy (FCE) and fuzzy bag-of-words model (FBoW), named frequently co-occurring entropy and fuzzy bag-of-words model (FCE-FBW). This method is used to cluster the words of different domains and integrate similar words together. These word clusters can be useful for tasks such as building knowledge-based domain repositories. FCE is used to pick out the generalizable features. FBoW supports the description of the same word by multiple dimensions. This paper combines the two models and proposes FCE-FBW method. It provides good performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tan, S., Cheng, X., Wang. Y., Xu, H.: Adapting Naive Bayes to domain adaptation for sentiment analysis. Adv. Inf. Retr. 5478, 337–349 (2009)
Zhao, R., Mao, K.: Fuzzy bag-of-words model for document representation, IEEE Trans. Fuzzy Syst. 14, 8
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 3111–3119 (2013)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv 1301, 3781 (2013)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J Am Soc Inf Sci 41 (1990)
Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis without aspect keyword supervision. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 618–626. ACM (2011)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Acknowledgments
This paper was sponsored by Jilin Provincial Science and Technology Department of China (Grant No. 20170204002GX), and Jilin Province Development and Reform Commission of China (Grant No. 2014Y056). We would like to thank the organizations for their support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhou, W., Wang, H., Sun, H., Sun, T. (2020). A Domain-Adapting Word Representation Method for Word Clustering. In: Jain, V., Patnaik, S., Popențiu Vlădicescu, F., Sethi, I. (eds) Recent Trends in Intelligent Computing, Communication and Devices. Advances in Intelligent Systems and Computing, vol 1006. Springer, Singapore. https://doi.org/10.1007/978-981-13-9406-5_18
Download citation
DOI: https://doi.org/10.1007/978-981-13-9406-5_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9405-8
Online ISBN: 978-981-13-9406-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)