Skip to main content

Feature Representation Based on Improved Word-Vector Clustering Using AP and E2LSH

  • Conference paper
  • First Online:
Theory, Methodology, Tools and Applications for Modeling and Simulation of Complex Systems (AsiaSim 2016, SCS AutumnSim 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 646))

Included in the following conference series:

  • 1154 Accesses

Abstract

Deep learning model has witnessed its obvious advantage in feature representation and document retrival. However, the model only considered most frequent words as the input to learn latent features, which inevitably ignores lots of useful information contained in documents especially for high-dimensional documents. We introduce a novel method based on word-vector clustering to obtain low-dimensional semantic vectors of documents, as the input of deep learning model to improve the feature representation in the output layer. Firstly, word-vector, a kind of compact and distributed representation of words, is obtained by training neural network language model using word2vec. Then, we present a modified word-vector clustering method based on locality-sensitive hashing and affinity propagation, with a stronger adaptability and scalability for large scale and high dimensionality. Afterwards, each document is represented by the set of cluster centers as the input of deep learning model. Experimental results proved the proposed method improves the ability of feature representation of deep learning model and performs better on document retrieval task compared with traditional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bengio, Y.: Learning deep architectures for AI. In: Foundations and Trends in Machine Learning (2009)

    Google Scholar 

  2. Bengio, Y., Delalleau, O.: Justifying and generalizing contrastive divergence. Neural Comput. 21(6), 1601–1621 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Salakhutdinov, R., Hinton, G.: Semantic hashing, In SIGIR workshop on information retrieval and applications of graphical models (2007)

    Google Scholar 

  4. Paccanaro, A., Hinton, G.: Learning distributed representations of concepts from relational data using linear relation. IEEE Trans. Knowl. Data Eng. 3, 98–104 (2001)

    Google Scholar 

  5. Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  6. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. Andoni, A., Indyk, P.: Nearest-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008). 50th anniversary issue

    Google Scholar 

  8. Mikolov, T., Chen, K., Corrado, G.: et al.: Efficient estimation of word representations in vector space[EB/OL], 18 September 2014. http://arxiv.org/abs/1301.3781v3

  9. Mikolov, T.: Word2vec project [EB/OL], 18 September 2014. https://code.googlecom/p/word2vec/

  10. Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inform. Process. Syst. 3111–3119 (2013)

    Google Scholar 

  11. Malcolm, S., Michael, C.: Locality-sensitive hashing for finding nearest neighbors. IEEE Sig. Process. Magzine 8(3), 128–131 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongmei Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Li, H., Hao, W., Zhang, H., Chen, G. (2016). Feature Representation Based on Improved Word-Vector Clustering Using AP and E2LSH. In: Zhang, L., Song, X., Wu, Y. (eds) Theory, Methodology, Tools and Applications for Modeling and Simulation of Complex Systems. AsiaSim SCS AutumnSim 2016 2016. Communications in Computer and Information Science, vol 646. Springer, Singapore. https://doi.org/10.1007/978-981-10-2672-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2672-0_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2671-3

  • Online ISBN: 978-981-10-2672-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics