Skip to main content

Short Text Hashing Improved by Integrating Topic Features and Tags

  • Conference paper
Neural Information Processing (ICONIP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8835))

Included in the following conference series:

  • 2412 Accesses

Abstract

Hashing, as an efficient approach, has been widely used for large-scale similarity search. Unfortunately, many existing hashing methods based on observed keyword features are not effective for short texts due to the sparseness and shortness. Recently, some researchers try to construct semantic relationship using certain granularity topics. However, the topics of certain granularity are insufficient to preserve the optimal semantic similarity for different types of datasets. On the other hand, tag information should be fully exploited to enhance the similarity of related texts. We, therefore, propose a novel unified hashing approach that the optimal topic features can be selected automatically to be integrated with original features for preserving similarity, and tags are fully utilized to improve hash code learning. We carried out extensive experiments on one short text dataset and even one normal text dataset. The results demonstrate that our approach is effective and significantly outperforms baseline methods on several evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS 2006, pp. 459–468. IEEE (2006)

    Google Scholar 

  2. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI, pp. 1776–1781. AAAI Press (2011)

    Google Scholar 

  5. Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Google Scholar 

  6. Lang, K.: Newsweeder: Learning to filter netnews. In: ICML. Citeseer (1995)

    Google Scholar 

  7. Lin, G., Shen, C., Suter, D., van den Hengel, A.: A general two-step approach to learning-based hashing. In: ICCV, pp. 2552–2559 (2013)

    Google Scholar 

  8. Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, pp. 91–100. ACM (2008)

    Google Scholar 

  9. Salakhutdinov, R., Hinton, G.: Semantic hashing. IJAR 50(7), 969–978 (2009)

    Google Scholar 

  10. Wang, Q., Zhang, D., Si, L.: Semantic hashing using tags and topic modeling. In: SIGIR, pp. 213–222. ACM (2013)

    Google Scholar 

  11. Weiss, Y., Torralba, A.: Spectral hashing. In: NIPS, vol. 9, pp. 1753–1760 (2008)

    Google Scholar 

  12. Zhang, D., Wang, J., Cai, D., Lu, J.: Extensions to self-taught hashing: Kernelisation and supervision. Practice 29, 38 (2010)

    Google Scholar 

  13. Zhang, D., Wang, J., Cai, D., Lu, J.: Laplacian co-hashing of terms and documents. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 577–580. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. Zhang, D., Wang, J., Cai, D., Lu, J.: Self-taught hashing for fast similarity search. In: SIGIR, pp. 18–25. ACM (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Xu, J., Xu, B., Zhao, J., Tian, G., Zhang, H., Hao, H. (2014). Short Text Hashing Improved by Integrating Topic Features and Tags. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8835. Springer, Cham. https://doi.org/10.1007/978-3-319-12640-1_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12640-1_36

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12639-5

  • Online ISBN: 978-3-319-12640-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics