Leveraging Pattern Associations for Word Embedding Models

Liu, Qian; Huang, Heyan; Gao, Yang; Wei, Xiaochi; Geng, Ruiying

doi:10.1007/978-3-319-55753-3_27

Qian Liu^18,19,
Heyan Huang¹⁸,
Yang Gao¹⁸,
Xiaochi Wei¹⁸ &
…
Ruiying Geng¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10177))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

2768 Accesses
1 Citations

Abstract

Word embedding method has been shown powerful to capture words association, and facilitated numerous applications by effectively bridging lexical gaps. Word semantic is encoded with vectors and modeled based on n-gram language models, as a result it only takes into consideration of words co-occurrences in a shallow slide windows. However, the assumption of the language modelling ignores valuable associations between words in a long distance beyond n-gram coverage. In this paper, we argue that it is beneficial to jointly modeling both surrounding context and flexible associative patterns so that the model can cover long distance and intensive association. We propose a novel approach to combine associated patterns for word embedding method via joint training objection. We apply our model for query expansion in document retrieval task. Experimental results show that the proposed method can perform significantly better than the state-of-the-arts baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994, pp. 487–499 (1994)
Google Scholar
Baroni, M., Murphy, B., Barbu, E., Poesio, M.: Strudel: a corpus-based semantic model based on properties and types. Cogn. Sci. 34(2), 222–254 (2010)
Article Google Scholar
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD 2(2), 66–75 (2000)
Article MATH Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. JMLR 3, 1137–1155 (2003)
MATH Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)
MATH Google Scholar
Bollegala, D., Maehara, T., Yoshida, Y., Kawarabayashi, K.: Learning word representations from relational graphs. In: AAAI 2015, pp. 2146–2152 (2015)
Google Scholar
Cheng, H., Yan, X., Han, J., Hsu, C.: Discriminative frequent pattern analysis for effective classification. In: ICDE 2007, pp. 716–725 (2007)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. JMLR 12, 2493–2537 (2011)
MATH Google Scholar
Gao, Y., Xu, Y., Li, Y.: Pattern-based topics for document modelling in information filtering. TKDE 27(6), 1629–1642 (2015)
Google Scholar
Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: learning sense embeddings for word and relational similarity. In: ACL 2015, pp. 95–105 (2015)
Google Scholar
Lebret, R., Collobert, R.: Word embeddings through hellinger PCA. In: EACL 2014, pp. 482–490 (2014)
Google Scholar
Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: ACL 2014, pp. 302–308 (2014)
Google Scholar
Li, J., Li, J., Fu, X., Masud, M.A., Huang, J.Z.: Learning distributed word representation with multi-contextual mixed embedding. KBS 106, 220–230 (2016)
Google Scholar
Liu, Y., Liu, Z., Chua, T., Sun, M.: Topical word embeddings. In: AAAI 2015, pp. 2418–2424 (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, pp. 3111–3119 (2013)
Google Scholar
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: NIPS 2008, pp. 1081–1088 (2008)
Google Scholar
Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: NIPS 2013, pp. 2265–2273 (2013)
Google Scholar
Murphy, B., Talukdar, P.P., Mitchell, T.M.: Learning effective and interpretable semantic models using non-negative sparse embedding. In: COLING 2012, pp. 1933–1950 (2012)
Google Scholar
Nam, J., Loza Mencía, E., Fürnkranz, J.: All-in text: learning document, label, and word representations jointly. In: AAAI 2016, pp. 1948–1954 (2016)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP 2014, pp. 1532–1543 (2014)
Google Scholar
Robertson, S.E., Zaragoza, H., Taylor, M.J.: Simple BM25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49 (2004)
Google Scholar
Rothe, S., Schütze, H.: Autoextend: extending word embeddings to embeddings for synsets and lexemes. In: ACL 2015, pp. 1793–1803 (2015)
Google Scholar
Schwartz, R., Reichart, R., Rappoport, A.: Symmetric pattern based word embeddings for improved word similarity prediction. In: CoNLL 2015, pp. 258–267 (2015)
Google Scholar
Sun, F., Guo, J., Lan, Y., Xu, J., Cheng, X.: Learning word representations by jointly modeling syntagmatic and paradigmatic relations. In: ACL 2015, pp. 136–145 (2015)
Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. JAIR 37, 141–188 (2010)
MathSciNet MATH Google Scholar
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: KDD 2002, pp. 639–644 (2002)
Google Scholar

Download references

Acknowledgments

The work was supported by National Nature Science Foundation of China (Grant No. 61132009), National Basic Research Program of China (973 Program, Grant No. 2013CB329303).

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Qian Liu, Heyan Huang, Yang Gao, Xiaochi Wei & Ruiying Geng
Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, Australia
Qian Liu

Authors

Qian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Ruiying Geng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Gao .

Editor information

Editors and Affiliations

Arizona State University , Tempe - Phoenix, Arizona, USA
Selçuk Candan
Hong Kong University of Science and Tech , Hong Kong, China
Lei Chen
Aalborg University , Aalborg, Denmark
Torben Bach Pedersen
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang
The University of Queensland , Brisbane, Queensland, Australia
Wen Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Q., Huang, H., Gao, Y., Wei, X., Geng, R. (2017). Leveraging Pattern Associations for Word Embedding Models. In: Candan, S., Chen, L., Pedersen, T., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10177. Springer, Cham. https://doi.org/10.1007/978-3-319-55753-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-55753-3_27
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55752-6
Online ISBN: 978-3-319-55753-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics