Skip to main content

Improving Matching Process with Expanding and Classifying Criterial Keywords leveraging Word Embedding and Hierarchical Clustering Methods


Matching processes, such as the selection of producers of advertising content corresponding to specific products or the screening of job applicants based on predefined requirements, have become important operations required by enterprises. Such problems generally include several keywords representing the matching criteria, but it is difficult for enterprises to expand and classify criterial keywords properly to improve the matching performance. This study proposes solutions to this issue by extracting criterial keywords from social networking services (SNSs) based on word embedding and by classifying the obtained keywords via hierarchical clustering. This approach will enable enterprises to gather and prioritize criterial keywords more accurately to improve their matching processes.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. Iwashita, M. (2019). A proposal of matching algorithm for new type of advertisement business model. Procedia Computer Science, 159, 1966–1975.

    Article  Google Scholar 

  2. Haan, W., & Kaltenbrunner, G. (2009). Anticipated growth and business cycles in matching models. Journal of Monetary Economics, 56(3), 309–327.

    Article  Google Scholar 

  3. Iwashita, M., Tanimoto, S., & Tsuchiya, K. (2018). Framework of highly secure transaction management for affiliate services of video advertising. Procedia Computer Science, 126, 1802–1809.

    Article  Google Scholar 

  4. Hall, R., & Schulhofer-Wohl, S. (2018). Measuring job-finding rates and matching efficiency with heterogeneous job-seekers. American Economic Journal, 10(1), 1–32.

    Google Scholar 

  5. Higashi, Y. (2018). Spatial spillovers in job matching: Evidence from the Japanese local labor markets. Journal of the Japanese and International Economics, 50, 1–15.

    Article  Google Scholar 

  6. Iwashita, M., Shimogawa, S., & Nishimatsu, K. (2011). Semantic analysis and classification method for customer enquiries in telecommunication services. Engineering Applications of Artificial Intelligence, 24(8), 1521–1531.

    Article  Google Scholar 

  7. Garg, M., & Kumar, M. (2018). Identifying influential segments from word co-occurrence networks using AHP. Cognitve Systems Research, 47, 28–41.

    Article  Google Scholar 

  8. Angelo, L., Stefan, P., Fratocchi, L., Marzola. A. (2018) An AHP-based method for choosing the best 3D scanner for cultural heritage applications. Journal of Cultural Heritage 34, 109–115.

  9. Mkolov, T., Chen, K., Corrado, G., Dean, J. (2013) Efficient estimation of word representations in vector space. Computation and Language

  10. Fukui, K., Miyazaki, T., Ohira, M. (2019) Suggesting questions that match each user’s expertise in community question and answering services, 20th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD)

  11. Jing, X., Wang, P., & Rayz, J. (2018). Discovering attribute-specific features from online reviews: what is the gap between automated tools and human cognition? Software Science and Computational Intelligence.

    Article  Google Scholar 

  12. Jan, R., Khan, A. (2020). Emotion mining using semantic similarity. Natural Language Processing.

    Article  Google Scholar 

  13. Kim, S., Park, H., Lee, J (2020) Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis. Expert Systems with Applications152, 113401

  14. Jatnika, D., Biijaksana, M., & Suryani, A. (2019). Word2Vec model analysis for semantic similarities in english words. Procedia Computer Science, 157, 160–167.

    Article  Google Scholar 

  15. Kai, H., Qing, L., Kunlun, Qi., Siluo, Y., Jin, M., Xiaokang, F., Jie, Z., Huayi, W., Ya, G., and Qibing, Z. (2019) Understanding the topic evolution of scientific literatures like an evolving city: Using Google Word2Vec model and spatial autocorrelation analysis. Information Processing and Management, 56 (4), 1185–1203

  16. Wolf, L., Hanani, Y., Bar, K., Dershowitz N. (2014) Joint word2vec Networks for Bilingual Semantic Representations. IJCLA 5, (1): 27–42

  17. Goel, A., Ganesh, L., Kaur, A. (2019) Sustainability integration in the management of construction projects: A morphological analysis of over two decades’ research literature. Journal of Cleaner Production, 236, 117676

  18. Lee, H., Park, G., Kim, H. (2018) Effective integration of morphological analysis and named entity recognition based on a recurrent neural network. Pattern Recognition Letters, 112, 361–365

  19. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013) Efficient estimation of word representations in vector space. arXiv preprint. arXiv: 1301.3781

  20. Church, K. (2017). Word2Vec. Natural Language Engineering, 23(1), 155–162.

    Article  Google Scholar 

  21. Jianqiang, L., Jing, L., Xianghua, F., Masud, M., Zhexue, H. (2016) Learning distributed word representation with multi-contextual mixed embedding. Knowledge-Based Systems, 106, 220–230

  22. Carrasco, R., & Sicilia, M. (2018). Unsupervised intrusion detection through skip-gram models of network behavior. Computers and Security, 78, 187–197.

    Article  Google Scholar 

  23. Lior, R., Maimon, O. (2005) Clustering methods - Data mining and knowledge discovery, handbook, (Springer US), 321−352

  24. Chakraborty, S., Paul, D., & Das, S. (2020). Hierarchical clustering with optimal transport. Statistics and Probability Letters, 163, 108781.

    Article  Google Scholar 

  25. Xu, Q., Zhang, Q., Liu, J., Luo, B. (2020) Efficient synthetical clustering validity indexes for hierarchical clustering. Expert Systems with Applications, 151, 113367

  26. Kim, Hy., Kim, Ha., Cho, S. (2020) Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling. Expert Systems with Applications, 150, 113288

  27. Bai, L., Liang, J., & Cao, F. (2020). A multiple k-means clustering ensemble algorithm to find nonlinearly separable clusters. Information Fusion, 61, 36–47.

    Article  Google Scholar 

  28. Takuma, H. (2018) Consideration of feasibility to support function for value indicator management by mathematical analysis for implementation of IoPM. J. Intern. Assoc. of P2M 13(1), 249−259

  29. Takuma, H., Hiyama, M. (2015) Discussion of the value indicators for associating projects with programs. Journal International of Association. of P2M. 10(1): 23−34

  30. Takuma, H., Iwakami, Y. (2018) Extraction of fundamental KPIs in new product development using Bayesian network analysis. Proceedings of the 6th Asian Conf. on Innovative Energy and Environmental Chemical Engineering 163−169

  31. Yedidia, J., Freeman, W., Weiss, Y. (2019) Understanding belief propagation and its generalizations. Mitsubishi Electric Research Laboratories TR2001–22. Accessed May 30, 2019

  32. Sanchez, F., Bonjour, E., Micaelli, J., & Monitcolo, D. (2020). An approach based on bayesian network for improving project management maturity: an application to reduce cost overrun risks in engineering projects. Computers in Industry, 119, 103227.

    Article  Google Scholar 

  33. Yan, J., Zhang, Z., Lin, K., Yang, F., Luo, X. (2020) A hybrid scheme-based one-vs-all decision trees for multi-class classification tasks. Knowledge-Based Systems, 198, 105922

  34. Barsacchi, M., Bechini, A., Marcelloni, F.: An analysis of boosted ensembles of binary fuzzy decision trees. Expert Systems with Applications, 154, 113436

Download references


Use the plural heading even if you have single acknowledgement

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yutaka Iwakami.

Ethics declarations

Conflict of Interest Statement

On behalf of all authors, the corresponding author states that there is no conflict of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Appendixes, if needed, appear before the acknowledgments.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Iwakami, Y., Takuma, H. & Iwashita, M. Improving Matching Process with Expanding and Classifying Criterial Keywords leveraging Word Embedding and Hierarchical Clustering Methods. Rev Socionetwork Strat 14, 193–204 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Matching process
  • Word2Vec
  • Hierarchical clustering
  • NLP
  • SNS
  • Semantic analysis