A co-occurrence based approach of automatic keyword expansion using mass diffusion

Yin, Xicheng; Wang, Hongwei; Yin, Pei; Zhu, Hengmin; Zhang, Zhenyu

doi:10.1007/s11192-020-03601-7

A co-occurrence based approach of automatic keyword expansion using mass diffusion

Published: 01 July 2020

Volume 124, pages 1885–1905, (2020)
Cite this article

Scientometrics Aims and scope Submit manuscript

Xicheng Yin¹,
Hongwei Wang ORCID: orcid.org/0000-0003-0814-3498¹,
Pei Yin²,
Hengmin Zhu³ &
…
Zhenyu Zhang¹

673 Accesses
10 Citations
Explore all metrics

Abstract

The performance of keyword expansion in prior methods is often enhanced by adopting external knowledge. Given a set of initial keywords, this paper is motivated to propose a novel method to expand semantically or conceptually related keywords from domain corpus by employing mass diffusion. A bipartite word network is thus constructed based on co-occurrence relations between initial keywords and candidate words. The expanded keywords are identified via two-step mass diffusion which is carried out in the bipartite network. Experimental results prove that the proposed method outperforms both the typical statistical-based approach and graph-based approach. Our research is expected to complement the theoretical framework of keyword expansion and is applicable to the scenarios of query expansion, thesaurus construction, and text clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From Automatic Keyword Detection to Ontology-Based Topic Modeling

A Cluster Guided Topic Model for Social Query Expansion

A Topic Transition Map for Query Expansion: A Semantic Analysis of Click-Through Data and Test Collections

References

Abilhoa, W. D., & De Castro, L. N. (2014a). A keyword extraction method from twitter messages represented as graphs. Applied Mathematics and Computation,240, 308–325.
Google Scholar
Abilhoa, W. D., & De Castro, L. N. (2014b). TKG: A graph-based approach to extract keywords from tweets. In Distributed computing and artificial intelligence, 11th International Conference (pp. 425–432). Cham: Springer.
Azad, H. K., & Deepak, A. (2019). Query expansion techniques for information retrieval: A survey. Information Processing and Management,56(5), 1698–1735.
Google Scholar
Beliga, S., Meštrović, A., & Martinčić-Ipšić, S. (2015). An overview of graph-based keyword extraction methods and approaches. Journal of information and organizational sciences,39(1), 1–20.
Google Scholar
Biswas, S. K., Bordoloi, M., & Shreya, J. (2018). A graph based keyword extraction model using collective node weight. Expert Systems with Applications,97, 51–59.
Google Scholar
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems,30(1–7), 107–117.
Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A. M., Nunes, C., & Jatowt, A. (2018). A text feature based automatic keyword extraction method for single documents. In European conference on information retrieval (pp. 684–691). Cham: Springer.
Cava, W. (2011). U.S. Patent No. 7,962,463. Washington, DC: U.S. Patent and Trademark Office.
Chen, Y. H., Lu, E. J. L., & Tsai, M. F. (2014). Finding keywords in blogs: Efficient keyword extraction in blog mining via user behaviors. Expert Systems with Applications,41(2), 663–670.
Google Scholar
Chua, T. S., Neo, S. Y., Li, K. Y., Wang, G., Shi, R., Zhao, M, (2004). TRECVID 2004 search and feature extraction task by NUS PRIS. In NIST TRECVID workshop.
Das, D., & Petrov, S. (2011). Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (Vol. 1, pp. 600-609). Association for Computational Linguistics.
Duari, S., & Bhatnagar, V. (2019). sCAKE: Semantic connectivity aware keyword extraction. Information Sciences,477, 100–117.
Google Scholar
Ercan, G., & Cicekli, I. (2007). Using lexical chains for keyword extraction. Information Processing and Management,43(6), 1705–1714.
Google Scholar
Florescu, C., & Caragea, C. (2017). A position-biased pagerank algorithm for keyphrase extraction. In Thirty-first AAAI conference on artificial intelligence.
Gaglio, S., Re, G. L., & Morana, M. (2016). A framework for real-time Twitter data analysis. Computer Communications,73, 236–242.
Google Scholar
Hadzic, M., & Chang, E. (2005). Ontology-based support for human disease study. In Proceedings of the 38th Annual Hawaii international conference on system sciences (pp. 143a–143a). IEEE.
Hassan, H., & Menezes, A. (2013). Social text normalization using contextual graph random walks. In Proceedings of the 51st annual meeting of the association for computational linguistics (Volume 1: Long Papers) (Vol. 1, pp. 1577–1586).
Hughes, T., & Ramage, D. (2007). Lexical semantic relatedness with random graph walks. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL).
Hyung, Z., Park, J. S., & Lee, K. (2017). Utilizing context-relevant keywords extracted from a large collection of user-generated documents for music discovery. Information Processing and Management,53(5), 1185–1200.
Google Scholar
Kim, H. J., Lee, S., Lee, B., & Kang, S. (2010). Building concept network-based user profile for personalized web search. In 2010 IEEE/ACIS 9th international conference on computer and information science (pp. 567–572). IEEE.
Kim, S. N., Medelyan, O., Kan, M. Y., & Baldwin, T. (2010). Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation (pp. 21–26).
Lambiotte, R., Delvenne, J. C., & Barahona, M. (2014). Random walks, Markov processes and the multiscale modular organization of complex networks. IEEE Transactions on Network Science and Engineering,1(2), 76–90.
MathSciNet Google Scholar
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188–1196).
Li, S., Sun, Y., & Soergel, D. (2015). A new method for automatically constructing domain-oriented term taxonomy based on weighted word co-occurrence analysis. Scientometrics,103(3), 1023–1042.
Google Scholar
Litvak, M., & Last, M. (2008). Graph-based keyword extraction for single-document summarization. In Proceedings of the workshop on multi-source multilingual information extraction and summarization (pp. 17–24). Association for Computational Linguistics.
Liu, J. G., Zhou, T., & Guo, Q. (2011). Information filtering via biased heat conduction. Physical Review E,84(3), 037101.
Google Scholar
Ma, S. P., Li, C. H., Tsai, Y. Y., & Lan, C. W. (2013). Web service discovery using lexical and semantic query expansion. In 2013 IEEE 10th International Conference on e-Business Engineering (pp. 423–428). IEEE.
Matsuo, Y., & Ishizuka, M. (2004). Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools,13(01), 157–169.
Google Scholar
Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404–411).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
Mohsen, G., Al-Ayyoub, M., Hmeidi, I., & Al-Aiad, A. (2018). On the automatic construction of an Arabic thesaurus. In 2018 9th international conference on information and communication systems (ICICS) (pp. 243–247). IEEE.
Nasar, Z., Jaffry, S. W., & Malik, M. K. (2019). Textual keyword extraction and summarization: State-of-the-art. Information Processing and Management,56(6), 102088.
Google Scholar
Nasir, J. A., Varlamis, I., & Ishfaq, S. (2019). A knowledge-based semantic framework for query expansion. Information Processing and Management,56(5), 1605–1617.
Google Scholar
Nowroozi, M., Mirzabeigi, M., & Sotudeh, H. (2018). Constructing an ontology based on a thesaurus: A case of ASIS&TOnto based on the ASIS&T Web-based thesaurus. The Electronic Library,36(4), 750–764.
Google Scholar
Paliwal, A. V., Shafiq, B., Vaidya, J., Xiong, H., & Adam, N. (2012). Semantics-based automated service discovery. IEEE Transactions on Services Computing,5(2), 260–275.
Google Scholar
Papagiannopoulou, E., & Tsoumakas, G. (2018). Local word vectors guiding keyphrase extraction. Information Processing and Management,54(6), 888–902.
Google Scholar
Papagiannopoulou, E., & Tsoumakas, G. (2019). A review of keyphrase extraction (p. e1339). Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery.
Google Scholar
Peat, H. J., & Willett, P. (1991). The limitations of term co-occurrence data for query expansion in document retrieval systems. Journal of the american society for information science,42(5), 378–383.
Google Scholar
Shamim Khan, M., & Khor, S. (2004). Enhanced web document retrieval using automatic query expansion. Journal of the American Society for Information Science and Technology,55(1), 29–40.
Google Scholar
Siddiqi, S., & Sharan, A. (2015). Keyword and keyphrase extraction techniques: A literature review. International Journal of Computer Applications, 109(2), 18–23.
Vega-Oliveros, D. A., Gomes, P. S., Milios, E. E., & Berton, L. (2019). A multi-centrality index for graph-based keyword extraction. Information Processing and Management,56(6), 102063.
Google Scholar
Wang, J., Zhou, Y., Li, L., Hu, B., & Hu, X. (2009). Improving short text clustering performance with keyword expansion. In The sixth international symposium on neural networks (ISNN 2009) (pp. 291–298). Berlin: Springer.
Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Nevill-Manning, C. G. (2005). Kea: Practical automated keyphrase extraction. In Design and usability of digital libraries: Case studies in the Asia Pacific (pp. 129–152). IGI global.
Won, M., Martins, B., & Raimundo, F. (2019). Automatic extraction of relevant keyphrases for the study of issue competition. In Proceedings of the 20th international conference on computational linguistics and intelligent text processing, Berkeley, La Rochelle, France, April 7–13, 2019.
Wu, Y. (2018). Enriching a thesaurus as a better question-answering tool and information retrieval aid. Journal of Information Science,44(4), 512–525.
Google Scholar
Yang, K., Chen, Z., Cai, Y., Huang, D., & Leung, H. F. (2016). Improved automatic keyword extraction given more semantic knowledge. In International conference on database systems for advanced applications (pp. 112–125). Cham: Springer.
Yang, L., Li, K., & Huang, H. (2018). A new network model for extracting text keywords. Scientometrics,116(1), 339–361.
Google Scholar
Ying, Y., Qingping, T., Qinzheng, X., Ping, Z., & Panpan, L. (2017). A graph-based approach of automatic keyphrase extraction. Procedia Computer Science,107, 248–255.
Google Scholar
Zhang, Y. C., Medo, M., Ren, J., Zhou, T., Li, T., & Yang, F. (2007). Recommendation model based on opinion diffusion. EPL (Europhysics Letters),80(6), 68003.
MathSciNet Google Scholar
Zhang, Y., Tuo, M., Yin, Q., Qi, L., Wang, X., & Liu, T. (2020). Keywords extraction with deep neural network model. Neurocomputing,383, 113–121.
Google Scholar
Zhang, Q., Wang, Y., Gong, Y., & Huang, X. J. (2016). Keyphrase extraction using deep recurrent neural networks on twitter. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 836–845).
Zhang, N., Wang, J., Ma, Y., He, K., Li, Z., & Liu, X. F. (2018). Web service discovery based on goal-oriented query expansion. Journal of Systems and Software,142, 73–91.
Google Scholar
Zhou, T., Kuscsik, Z., Liu, J. G., Medo, M., Wakeling, J. R., & Zhang, Y. C. (2010). Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences,107(10), 4511–4515.
Google Scholar
Zhou, T., Ren, J., Medo, M., & Zhang, Y. C. (2007). Bipartite network projection and personal recommendation. Physical Review E,76(4), 046115.
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (71771177, 71601119, 71874088), Innovation Fund for University Production, Education and Research from China’s Ministry of Education (2019J01012), and International Exchange Program for Graduate Students, Tongji University (201902027). The authors thank the editor and the anonymous reviewers for their helpful comments and suggestions in improving this manuscript.

Author information

Authors and Affiliations

School of Economics and Management, Tongji University, Shanghai, 200092, China
Xicheng Yin, Hongwei Wang & Zhenyu Zhang
Business School, University of Shanghai for Science and Technology, Shanghai, 200093, China
Pei Yin
School of Management, Nanjing University of Posts and Telecommunications, Nanjing, 210003, China
Hengmin Zhu

Authors

Xicheng Yin
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pei Yin
View author publications
You can also search for this author in PubMed Google Scholar
Hengmin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongwei Wang.

Appendices

Appendix 1

See Table 6.

Table 6 Initial keyword list of four subjects (Dataset: Semeval)

Full size table

Appendix 2

See Table 7.

Table 7 The keywords retrieved by four methods (Dataset: Semeval)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, X., Wang, H., Yin, P. et al. A co-occurrence based approach of automatic keyword expansion using mass diffusion. Scientometrics 124, 1885–1905 (2020). https://doi.org/10.1007/s11192-020-03601-7

Download citation

Received: 27 August 2019
Published: 01 July 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11192-020-03601-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A co-occurrence based approach of automatic keyword expansion using mass diffusion

Abstract

Access this article

Similar content being viewed by others

From Automatic Keyword Detection to Ontology-Based Topic Modeling

A Cluster Guided Topic Model for Social Query Expansion

A Topic Transition Map for Query Expansion: A Semantic Analysis of Click-Through Data and Test Collections

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A co-occurrence based approach of automatic keyword expansion using mass diffusion

Abstract

Access this article

Similar content being viewed by others

From Automatic Keyword Detection to Ontology-Based Topic Modeling

A Cluster Guided Topic Model for Social Query Expansion

A Topic Transition Map for Query Expansion: A Semantic Analysis of Click-Through Data and Test Collections

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Appendix 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation