Skip to main content

A Domain-Adapting Word Representation Method for Word Clustering

  • Conference paper
  • First Online:
Recent Trends in Intelligent Computing, Communication and Devices

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1006))

Abstract

Extracting key information from texts is a goal of natural language processing (NLP) field. A few keywords could prompt the main idea of the text, and the complete vocabulary information is richer, but not easy to organize. This paper proposes a word representation method based on frequently co-occurring entropy (FCE) and fuzzy bag-of-words model (FBoW), named frequently co-occurring entropy and fuzzy bag-of-words model (FCE-FBW). This method is used to cluster the words of different domains and integrate similar words together. These word clusters can be useful for tasks such as building knowledge-based domain repositories. FCE is used to pick out the generalizable features. FBoW supports the description of the same word by multiple dimensions. This paper combines the two models and proposes FCE-FBW method. It provides good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tan, S., Cheng, X., Wang. Y., Xu, H.: Adapting Naive Bayes to domain adaptation for sentiment analysis. Adv. Inf. Retr. 5478, 337–349 (2009)

    Google Scholar 

  2. Zhao, R., Mao, K.: Fuzzy bag-of-words model for document representation, IEEE Trans. Fuzzy Syst. 14, 8

    Google Scholar 

  3. Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 3111–3119 (2013)

    Google Scholar 

  4. Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv 1301, 3781 (2013)

    Google Scholar 

  5. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J Am Soc Inf Sci 41 (1990)

    Google Scholar 

  6. http://scikit-learn.org/stable/

  7. Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis without aspect keyword supervision. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 618–626. ACM (2011)

    Google Scholar 

  8. http://www.nltk.org/

  9. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Google Scholar 

Download references

Acknowledgments

This paper was sponsored by Jilin Provincial Science and Technology Department of China (Grant No. 20170204002GX), and Jilin Province Development and Reform Commission of China (Grant No. 2014Y056). We would like to thank the organizations for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tieli Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, W., Wang, H., Sun, H., Sun, T. (2020). A Domain-Adapting Word Representation Method for Word Clustering. In: Jain, V., Patnaik, S., Popențiu Vlădicescu, F., Sethi, I. (eds) Recent Trends in Intelligent Computing, Communication and Devices. Advances in Intelligent Systems and Computing, vol 1006. Springer, Singapore. https://doi.org/10.1007/978-981-13-9406-5_18

Download citation

Publish with us

Policies and ethics