Skip to main content

GOW-LDA: Applying Term Co-occurrence Graph Representation in LDA Topic Models Improvement

  • Conference paper
  • First Online:
Computational Science and Technology (ICCST 2017)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 488))

Included in the following conference series:

Abstract

In this paper, we demonstrate a novel approach in topic model exploration by applying word co-occurrence graph or g raph-o f-w ords (GOW) in order to produce more informative extracted latent topics from a large document corpus. According to the L atent D irichlet A llocation (LDA) algorithm, it only considers the words occurrence independently via probabilistic distributions. It leads to the failure in term’s relationship recognition. Hence in order to overcome this disadvantage of traditional LDA, we propose a novel approach, called GOW-LDA. The GOW-LDA is proposed that combines the GOW graph used in document representation, the frequent subgraph extracting and distribution model of LDA. For evaluation, we compare our proposed model with the traditional one in different classification algorithms. The comparative evaluation is performed in this study by using the standardized datasets. The results generated by the experiments show that the proposed algorithm yields performance respectably.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826

    Article  Google Scholar 

  3. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2011). Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 262–272 (2011)

    Google Scholar 

  4. Rajagopal, D., Olsher, D., Cambria, E., Kwok, K.: Commonsense-based topic modeling. In: Proceedings KDD WISDOM, vol. 6. ACM (2013)

    Google Scholar 

  5. Ferrugento, A., Oliveira, H.G., Alves, A.O., Rodrigues, F.: Can topic modelling benefit from word sense information? In: LREC (2016)

    Google Scholar 

  6. Rousseau, F., Vazirgiannis, M.: Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (CIKM 2013), pp. 59–68. ACM, New York (2013). http://dx.doi.org/10.1145/2505515.2505671

  7. Meladianos, P., Nikolentzos, G., Rousseau, F., Stavrakas, Y., Vazirgiannis, M.: Degeneracy-based real-time sub-event detection in Twitter stream. In: ICWSM 2015, pp. 248–257 (2015)

    Google Scholar 

  8. Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: ACL, vol. 1, pp. 1702–1712 (2015)

    Google Scholar 

  9. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings 2002 IEEE International Conference on Data Mining, ICDM 2003, pp. 721–724. IEEE (2002)

    Google Scholar 

  10. Jun, H., Wei, W., Jan, P.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Third IEEE International Conference on Data Mining, 2003, ICDM 2003. IEEE, pp. 549–552 (2003)

    Google Scholar 

  11. Huan, J., Wang, W., Prins, J., Yang, J.: SPIN: mining maximal frequent subgraphs from graph databases. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 581–586. ACM, New York (2004). http://dx.doi.org/10.1145/1014052.1014123

  12. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (Sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004). https://doi.org/10.1109/TPAMI.2004.75

    Article  Google Scholar 

  13. Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endow. 1(1), 364–375 (2008). https://doi.org/10.14778/1453856.1453899

    Article  Google Scholar 

  14. Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. Proc. VLDB Endow. 6(2), 133–144 (2012). https://doi.org/10.14778/2535568.2448946

    Article  Google Scholar 

Download references

Acknowledgement

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under the grant number B2017-26-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phuc Do .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pham, P., Do, P., Ta, C.D.C. (2018). GOW-LDA: Applying Term Co-occurrence Graph Representation in LDA Topic Models Improvement. In: Alfred, R., Iida, H., Ag. Ibrahim, A., Lim, Y. (eds) Computational Science and Technology. ICCST 2017. Lecture Notes in Electrical Engineering, vol 488. Springer, Singapore. https://doi.org/10.1007/978-981-10-8276-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8276-4_40

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8275-7

  • Online ISBN: 978-981-10-8276-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics