Mining Labor Market Requirements Using Distributional Semantic Models and Deep Learning

  • Dmitriy BotovEmail author
  • Julius Klenin
  • Andrey Melnikov
  • Yuri Dmitrin
  • Ivan Nikolaev
  • Mikhail Vinel
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 354)


This article describes a new method for analyzing labor market requirements by matching job listings from online recruitment platforms with professional standards to weigh the importance of particular professional functions and requirements and enrich the general concepts of professional standards using real labor market requirements. Our approach aims to combat the gap between professional standards and reality of fast changing requirements in developing branches of economy. First, we determine professions for each job description, using the multi-label classifier based on convolutional neural networks. Secondly, we solve the task of concept matching between job descriptions and standards for the respective professions by applying distributional semantic models. In this task, the average word2vec model achieved the best performance among other vector space models. Finally, we experiment with expanding general vocabulary of professional standards with the most frequent unigrams and bigrams occurring in matching job descriptions. Performance evaluation is carried out on a representative corpus of job listings and professional standards in the field of IT.


Natural language processing Distributional semantic model Deep learning Convolutional neural networks Multilabel classification Semantic similarity Information extraction Labor market requirements Professional standards 



Research has been supported by the RFBR grant No. 18-47-860013 r_a Intelligent system for the formation of educational programs based on neural network models of natural language to meet the requirements of the digital economy. We are grateful to the students and lecturers of Chelyabinsk State University for help in preparing and marking data, as well as in conducting experiments. We are grateful to the head and IT-specialists of the Intersvyaz company ( who provided the necessary computational platform for the experiments.


  1. 1.
    Gorshkov, M.K., Kliucharev, G.A.: Nepreryvnoe obrazovanie v kontekste modernizatsii. [Continuing education in the context of modernization]. Moscow: IS RAN, FGNU TsSI, p. 232 (2011)Google Scholar
  2. 2.
    Muthyala, R., Wood, S., Jin, Y., Qin, Y., Gao, H., Rai, A.: Data-driven job search engine using skills and company attribute filters. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW) (2017)Google Scholar
  3. 3.
    Karakatsanis, I., et al.: Data mining approach to monitoring the requirements of the job market: a case study. Inf. Syst. 65, 16 (2017)CrossRefGoogle Scholar
  4. 4.
    Mller, O., Schmiedel, T., Gorbacheva, E., Brocke, J.V.: Towards a typology of business process management professionals: identifying patterns of competences through latent semantic analysis. Enterp. Inf. Syst. 10, 5080 (2014)Google Scholar
  5. 5.
    Zhao, M., Javed, F., Jacob, F., McNair, M.: SKILL: a system for skill identification and normalization. In: Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence, pp. 4012–4018, January 2015Google Scholar
  6. 6.
    Sayfullina, L., Malmi, E., Kannala, J.: Learning representations for soft skill matching. In: van der Aalst, W.M.P., et al. (eds.) AIST 2018. LNCS, vol. 11179, pp. 141–152. Springer, Cham (2018). Scholar
  7. 7.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  8. 8.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning. Beijing, China, JMLR: W&CP 2014, vol. 32, pp. 1188–1196 (2014)Google Scholar
  9. 9.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Szymaski, P., Kajdanowicz, T.: A scikit-based Python environment for performing multi-label classification. arXiv preprint arXiv:1702.01460 (2017)
  11. 11.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 1746–1751 (2014)Google Scholar
  12. 12.
    Arkhipenko, K., Kozlov, I., Trofimovich, J., Skorniakov, K., Gomzin, A., Turdakov, D.: Comparison of neural network architectures for sentiment analysis of Russian tweets. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference Dialogue 2016, pp. 50–58. RGGU, Moscow (2016)Google Scholar
  13. 13.
    Panchenko, A., Loukachevitch, N., Ustalov, D., Paperno, D., Meyer, C., Konstantinova., N.: RUSSE: the first workshop on Russian semantic similarity. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference Dialogue 2015, vol. 2, pp. 89–105. RGGU, Moscow (2015)Google Scholar
  14. 14.
    Panchenko, A., et al.: a shared task on word sense induction for the Russian language. In: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference Dialogue 2015, pp. 547–564. RGGU, Moscow (2018)Google Scholar
  15. 15.
    WordCloud for Python Documentation. Accessed 29 Nov 2018

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Dmitriy Botov
    • 1
    Email author
  • Julius Klenin
    • 1
  • Andrey Melnikov
    • 2
  • Yuri Dmitrin
    • 1
  • Ivan Nikolaev
    • 1
  • Mikhail Vinel
    • 1
  1. 1.Chelyabinsk State UniversityChelyabinskRussia
  2. 2.Ugra Research Institute of Information TechnologiesKhanty-MansiyskRussia

Personalised recommendations