Feature Representation Based on Improved Word-Vector Clustering Using AP and E2LSH

Li, Hongmei; Hao, Wenning; Zhang, Hongjun; Chen, Gang

doi:10.1007/978-981-10-2672-0_15

Hongmei Li¹³,
Wenning Hao¹³,
Hongjun Zhang¹³ &
…
Gang Chen¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 646))

Included in the following conference series:

1154 Accesses

Abstract

Deep learning model has witnessed its obvious advantage in feature representation and document retrival. However, the model only considered most frequent words as the input to learn latent features, which inevitably ignores lots of useful information contained in documents especially for high-dimensional documents. We introduce a novel method based on word-vector clustering to obtain low-dimensional semantic vectors of documents, as the input of deep learning model to improve the feature representation in the output layer. Firstly, word-vector, a kind of compact and distributed representation of words, is obtained by training neural network language model using word2vec. Then, we present a modified word-vector clustering method based on locality-sensitive hashing and affinity propagation, with a stronger adaptability and scalability for large scale and high dimensionality. Afterwards, each document is represented by the set of cluster centers as the input of deep learning model. Experimental results proved the proposed method improves the ability of feature representation of deep learning model and performs better on document retrieval task compared with traditional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bengio, Y.: Learning deep architectures for AI. In: Foundations and Trends in Machine Learning (2009)
Google Scholar
Bengio, Y., Delalleau, O.: Justifying and generalizing contrastive divergence. Neural Comput. 21(6), 1601–1621 (2009)
Article MathSciNet MATH Google Scholar
Salakhutdinov, R., Hinton, G.: Semantic hashing, In SIGIR workshop on information retrieval and applications of graphical models (2007)
Google Scholar
Paccanaro, A., Hinton, G.: Learning distributed representations of concepts from relational data using linear relation. IEEE Trans. Knowl. Data Eng. 3, 98–104 (2001)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MathSciNet MATH Google Scholar
Andoni, A., Indyk, P.: Nearest-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008). 50th anniversary issue
Google Scholar
Mikolov, T., Chen, K., Corrado, G.: et al.: Efficient estimation of word representations in vector space[EB/OL], 18 September 2014. http://arxiv.org/abs/1301.3781v3
Mikolov, T.: Word2vec project [EB/OL], 18 September 2014. https://code.googlecom/p/word2vec/
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inform. Process. Syst. 3111–3119 (2013)
Google Scholar
Malcolm, S., Michael, C.: Locality-sensitive hashing for finding nearest neighbors. IEEE Sig. Process. Magzine 8(3), 128–131 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Command Information Systems, PLA University of Science and Technology, Nanjing, China
Hongmei Li, Wenning Hao, Hongjun Zhang & Gang Chen

Authors

Hongmei Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenning Hao
View author publications
You can also search for this author in PubMed Google Scholar
Hongjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongmei Li .

Editor information

Editors and Affiliations

Beihang University , Beijing, China
Lin Zhang
Beihang University , Beijing, China
Xiao Song
Beihang University , Beijing, China
Yunjie Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Hao, W., Zhang, H., Chen, G. (2016). Feature Representation Based on Improved Word-Vector Clustering Using AP and E²LSH. In: Zhang, L., Song, X., Wu, Y. (eds) Theory, Methodology, Tools and Applications for Modeling and Simulation of Complex Systems. AsiaSim SCS AutumnSim 2016 2016. Communications in Computer and Information Science, vol 646. Springer, Singapore. https://doi.org/10.1007/978-981-10-2672-0_15

Download citation

DOI: https://doi.org/10.1007/978-981-10-2672-0_15
Published: 22 September 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2671-3
Online ISBN: 978-981-10-2672-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics