Building a soft skill taxonomy from job openings


Soft skills are crucial for candidates in the job market, and analyzing these skills listed in job ads can help in identifying the most important soft skills required by recruiters. This analysis can benefit from building a taxonomy to extract soft skills. However, most prior work is primarily focused on building hard skill taxonomies. Unfortunately, methodologies for building hard skill taxonomies do not work well for soft skills, due to the wide variety of terminologies used to list soft skills in job ads. Moreover, prior work has mainly focused on extracting soft skills from job ads using a simple keyword search, which can fail to detect the different forms in which soft skills are listed in job ads. In this paper, we develop TaxoSoft, a methodology for building a soft skill taxonomy that uses DBpedia and Word2Vec in order to find terms related to different soft skills. TaxoSoft also uses social network analysis to build a hierarchy of terms. We use this method to build soft skill taxonomies in both English and French. We evaluate TaxoSoft on a sample of job ads and find that it achieves an F-score of 0.84, while taxonomies developed in prior work achieve an F-score of only 0.54. We then use the proposed methodology to analyze soft skills listed in job ads in order to find the skills most required in the American and Moroccan job markets. Our findings can offer insights to universities about the top soft skills requested in the job market.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

  6. 6.

  7. 7.

  8. 8.

  9. 9.


  1. Altszyler E, Sigman M, Ribeiro S, Slezak DF (2017) Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. Conscious Cogn 56:178–187.

    Article  Google Scholar 

  2. Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) DBpedia: a nucleus for a web of open data. In: The semantic web, Lecture notes in computer science. Springer, Berlin, pp 722–735,

  3. Balcar J (2014) Soft skills and their wage returns: overview of empirical literature. Rev Econ Perspect 14(1):3–15.

    Article  Google Scholar 

  4. Bastian M, Hayes M, Vaughan W, Shah S, Skomoroch P, Kim H, Uryasev S, Lloyd C (2014) LinkedIn skills: large-scale topic extraction and inference. In: Proceedings of the 8th ACM conference on recommender systems, ACM, New York, NY, USA, RecSys’14, pp 1–8.

  5. Benz D, Hotho A, Stumme G, Stützer S (2010) Semantics made by you and me: self-emerging ontologies can capture the diversity of shared knowledge. In: Proceedings of the 2nd web science conference (WebSci10)

  6. Blake R, Gutierrez O (2011) A semantic analysis approach for assessing professionalism using free-form text entered online. Comput Hum Behav 27(6):2249–2262.

    Article  Google Scholar 

  7. Boldi P, Monti C (2016) Cleansing Wikipedia categories using centrality. ACM Press, New York, pp 969–974.

    Google Scholar 

  8. Brooks NG, Greer TH, Morris SA (2018) Information systems security job advertisement analysis Skills review and implications for information systems curriculum. J Educ Bus 93(5):213–221.

    Article  Google Scholar 

  9. Calanca F, Sayfullina L, Minkus L, Wagner C, Malmi E (2018) Responsible team players wanted an analysis of soft skill requirements in job advertisements. arXiv:181007781

  10. Cornali F (2018) Training and developing soft skills in higher education. In: 4th international conference on higher education advances (HEAD’18), Editorial Universitat Politecnica de Valencia, pp 961–967

  11. Daneva M, Wang C, Hoener P (2017) What the job market wants from requirements engineers? An empirical analysis of online job ads from the Netherlands. In: 2017 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 448–453.

  12. De Smedt J, le Vrang M, Papantoniou A (2015) ESCO: towards a semantic web for the European labor market. In: LDOW@ WWW

  13. Fellbaum C (1998) A semantic network of english: the mother of all wordnets. Comput Human 32(2):209–220.

    Article  Google Scholar 

  14. Fernandez-Sanz L (2010) Analysis of non technical skills for ICT profiles. In: 5th Iberian conference on information systems and technologies, pp 1–5

  15. Florea R, Stray V (2018) Software tester, we want to hire you an analysis of the demand for soft skills. In: Garbajosa J, Wang X, Aguiar A (eds) Agile processes in software engineering and extreme programming. Lecture notes in business information processing. Springer, Berlin, pp 54–67

    Google Scholar 

  16. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th international joint conference on artifical intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’07, pp 1606–1611. Event-place: Hyderabad, India

  17. Gardiner A, Aasheim C, Rutner P, Williams S (2018) Skill requirements in big data a content analysis of job advertisements. J Comput Inf Syst 58(4):374–384.

    Article  Google Scholar 

  18. Grineva M, Grinev M, Lizorkin D (2009) Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th international conference on World wide web—WWW’09, ACM Press, Madrid, Spain, p 661.

  19. Heymann P, Garcia-Molina H (2006) Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical report, Stanford

  20. Hillmer G, Fink C, Foradori M, Gall M, Kilian D, Sparer W (2007) Social and soft skills training concept in engineering education. Innovations 2007: world innovations in engineering education and research, International network for engineering education and research, pp 355–366

  21. Hurrell SA (2016) Rethinking the soft skills deficit blame game: employers, skills withdrawal and the reporting of soft skills gaps. Hum Relat 69(3):605–628

    Article  Google Scholar 

  22. Javed F, Hoang P, Mahoney T, McNair M (2017) Large-scale occupational skills normalization for online recruitment.

  23. Joseph D, Ang S, Chang RHL, Slaughter SA (2010) Practical intelligence in IT: assessing soft skills of IT professionals. Commun ACM 53(2):149–154.

    Article  Google Scholar 

  24. Kautz T, Heckman JJ, Diris R, Weel Bt, Borghans L (2014) Fostering and measuring skills: improving cognitive and non-cognitive skills to promote lifetime success. Working Paper 20749, National Bureau of Economic Research.

  25. Kivimäki I, Panchenko A, Dessy A, Verdegem D, Francq P, Bersini H, Saerens M (2013) A graph-based approach to skill extraction from text. In: Proceedings of TextGraphs-8 graph-based methods for natural language processing, pp 79–87

  26. Lacerenza CN, Marlow SL, Tannenbaum SI, Salas E (2018) Team development interventions: evidence-based approaches for improving teamwork. Am Psychol 73(4):517

    Article  Google Scholar 

  27. Lai S, Liu K, Xu L, Zhao J (2015) How to generate a good word embedding? arXiv:150705523

  28. Maitra S, Gopalram K (2016) Ethics and soft skill assessment tool for program outcome attainment: a case study. In: 2016 IEEE 4th international conference on MOOCs, innovation and technology in education (MITE), pp 317–324.

  29. Malherbe E, Aufaure MA (2016) Bridge the terminology gap between recruiters and candidates: a multilingual skills base built from social media and linked data. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 583–590.

  30. Manku GS, Jain A, Das Sarma A (2007) Detecting near-duplicates for web crawling. In: Proceedings of the 16th international conference on World Wide Web, ACM, New York, NY, USA, WWW’07, pp 141–150.

  31. Manpower (2017) Talent shortage 2016–2017 | ManpowerGroup.

  32. Matturro G (2013) Soft skills in software engineering A study of its demand by software companies in Uruguay. In: 2013 6th international workshop on cooperative and human aspects of software engineering (CHASE), pp 133–136.

  33. Matturro G, Raschetti F, Fontàn C (2015) Soft skills in software development teams a survey of the points of view of team leaders and team members. In: 2015 IEEE/ACM 8th international workshop on cooperative and human aspects of software engineering, pp 101–104.

  34. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:13013781

  35. Monasor MJ, Noll J, Vizcaìno A, Piattini M, Beecham S (2014) Walk before you run: using heuristic evaluation to assess a training tool prototype. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’14, pp 41:1–41:10. Event-place: London, England, United Kingdom

  36. Nolinske T, Millis B (1999) Cooperative learning as an approach to pedagogy. Am J Occup Ther 53(1):31–40

    Article  Google Scholar 

  37. Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: In Proceedings of the LREC 2010 workshop on new challenges for NLP Frameworks, Citeseer

  38. Roget PM (1911) Roget’s thesaurus of English words and phrases. TY Crowell Company, Philadelphia

    Google Scholar 

  39. Smith SP, Hickmott D, Bille R, Burd E, Southgate E, Stephens L (2015) Improving undergraduate soft skills using m-learning and serious games. In: 2015 IEEE international conference on teaching, assessment, and learning for engineering (TALE), pp 230–235.

  40. Wu F, Weld DS (2010) Open information extraction using Wikipedia. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 118–127

  41. Yanaze LKH, Lopes RdD (2014) Transversal competencies of electrical and computing engineers considering market demand. In: 2014 IEEE frontiers in education conference (FIE) proceedings, pp 1–4.

  42. Zaharim A, Ahmad I, Yusoff YM, Omar MZ, Basri H (2012) Evaluating the soft skills performed by applicants of malaysian engineers. Procedia Soc Behav Sci 60:522–528.

    Article  Google Scholar 

  43. Zhao M, Javed F, Jacob F, McNair M (2015) SKILL: a system for skill identification and normalization. In: Proceedings of the 29th AAAI conference on artificial intelligence, AAAI Press, AAAI’15, pp 4012–4017. Event-place: Austin, Texas

Download references


This work is supported in part by the United States Agency for International Development (USAID) under grant AID-OAAA-11-00012 and by a Google Africa PhD fellowship. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied of USAID or Google. The authors would like to thank Mehdi Zakroum and Ibtissam Makdoun for useful comments and discussion.

Author information



Corresponding author

Correspondence to Imane Khaouja.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



See Tables 13, 14 and 15.

Table 13 Soft skill taxonomy in English
Table 14 Soft skill taxonomy in French
Table 15 List of soft skills

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Khaouja, I., Mezzour, G., Carley, K.M. et al. Building a soft skill taxonomy from job openings. Soc. Netw. Anal. Min. 9, 43 (2019).

Download citation