Abstract
In this paper, we demonstrate a novel approach in topic model exploration by applying word co-occurrence graph or g raph-o f-w ords (GOW) in order to produce more informative extracted latent topics from a large document corpus. According to the L atent D irichlet A llocation (LDA) algorithm, it only considers the words occurrence independently via probabilistic distributions. It leads to the failure in term’s relationship recognition. Hence in order to overcome this disadvantage of traditional LDA, we propose a novel approach, called GOW-LDA. The GOW-LDA is proposed that combines the GOW graph used in document representation, the frequent subgraph extracting and distribution model of LDA. For evaluation, we compare our proposed model with the traditional one in different classification algorithms. The comparative evaluation is performed in this study by using the standardized datasets. The results generated by the experiments show that the proposed algorithm yields performance respectably.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2011). Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 262–272 (2011)
Rajagopal, D., Olsher, D., Cambria, E., Kwok, K.: Commonsense-based topic modeling. In: Proceedings KDD WISDOM, vol. 6. ACM (2013)
Ferrugento, A., Oliveira, H.G., Alves, A.O., Rodrigues, F.: Can topic modelling benefit from word sense information? In: LREC (2016)
Rousseau, F., Vazirgiannis, M.: Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (CIKM 2013), pp. 59–68. ACM, New York (2013). http://dx.doi.org/10.1145/2505515.2505671
Meladianos, P., Nikolentzos, G., Rousseau, F., Stavrakas, Y., Vazirgiannis, M.: Degeneracy-based real-time sub-event detection in Twitter stream. In: ICWSM 2015, pp. 248–257 (2015)
Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: ACL, vol. 1, pp. 1702–1712 (2015)
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings 2002 IEEE International Conference on Data Mining, ICDM 2003, pp. 721–724. IEEE (2002)
Jun, H., Wei, W., Jan, P.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Third IEEE International Conference on Data Mining, 2003, ICDM 2003. IEEE, pp. 549–552 (2003)
Huan, J., Wang, W., Prins, J., Yang, J.: SPIN: mining maximal frequent subgraphs from graph databases. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 581–586. ACM, New York (2004). http://dx.doi.org/10.1145/1014052.1014123
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (Sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004). https://doi.org/10.1109/TPAMI.2004.75
Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endow. 1(1), 364–375 (2008). https://doi.org/10.14778/1453856.1453899
Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. Proc. VLDB Endow. 6(2), 133–144 (2012). https://doi.org/10.14778/2535568.2448946
Acknowledgement
This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under the grant number B2017-26-02.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pham, P., Do, P., Ta, C.D.C. (2018). GOW-LDA: Applying Term Co-occurrence Graph Representation in LDA Topic Models Improvement. In: Alfred, R., Iida, H., Ag. Ibrahim, A., Lim, Y. (eds) Computational Science and Technology. ICCST 2017. Lecture Notes in Electrical Engineering, vol 488. Springer, Singapore. https://doi.org/10.1007/978-981-10-8276-4_40
Download citation
DOI: https://doi.org/10.1007/978-981-10-8276-4_40
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8275-7
Online ISBN: 978-981-10-8276-4
eBook Packages: EngineeringEngineering (R0)