WoLMIS: a labor market intelligence system for classifying web job vacancies


In the last decades, an increasing number of employers and job seekers have been relying on Web resources to get in touch and to find a job. If appropriately retrieved and analyzed, the huge number of job vacancies available today on on-line job portals can provide detailed and valuable information about the Web Labor Market dynamics and trends. In particular, this information can be useful to all actors, public and private, who play a role in the European Labor Market. This paper presents WoLMIS, a system aimed at collecting and automatically classifying multilingual Web job vacancies with respect to a standard taxonomy of occupations. The proposed system has been developed for the Cedefop European agency, which supports the development of European Vocational Education and Training (VET) policies and contributes to their implementation. In particular, WoLMIS allows analysts and Labor Market specialists to make sense of Labor Market dynamics and trends of several countries in Europe, by overcoming linguistic boundaries across national borders. A detailed experimental evaluation analysis is also provided for a set of about 2 million job vacancies, collected from a set of UK and Irish Web job sites from June to September 2015.

    The Commission Communication “New Skills for New Jobs” (COM(2008) 868, 16.12.2008)

    The Commission Communication “An Agenda for new skills and jobs: A European contribution toward full employment” (COM(2010) 682, 23.11.2010)

    The Commission Communication “A New Skills Agenda for Europe” COM(2016) 381/2, available at

    Real-time Labor Market information on skill requirements: feasibility study and working prototype. Cedefop Reference number AO/RPA/VKVET-NSOFRO/Real-time LMI/010/14. Contract notice 2014/S 141-252026 of 15/07/2014

    For more information on SOC2000, the interested reader can refer to SOC2000 (2016).

    The previously cited extension of the Standard Occupational Classification (SOC) system developed by the U.S. Bureau of Labor Statistics.

    As it will be illustrated in Section 5.2 in Table 4, the 10% of (the most representative) title words are enough to achieve 80% of classification accuracy. Nevertheless, the table shows that the best performances are achieved using all the title words.

    The market in which workers find an employment, employers find available workers, and wage rates are determined.

    The European Network on Regional Labor Market Monitoring (ENRLMM 2016).

    The European classification system for economical sectors, see

    Generally speaking, an n-gram is a set of n consecutive words.

    The visiting frequency was tuned for each Web site taking into account: the publishing rate, the average time an advertisement is kept on-line, and suggestions of the Web masters who accepted to collaborate with the project.

    Actually, there are some vacancies, mostly looking for language teachers.

    According to (ISCO 2012), “Water and firewood collectors” gather water and firewood and transport them on foot or using hand or animal carts.

    sklearn.svm.LinearSVC is a wrapper around the liblinear library (Fan et al. 2008), while sklearn.svm.SVC is a wrapper around the libsvm library (Chang & Lin 2011).

    Also known as weighted averaging.

    A 3-layer (of which 1 hidden layer) Neural Network has the ability to properly address linear classification problems (Jain et al. 1996; Lippmann 1987).

    The lower quartile is the 25th percentile while the upper quartile is the 75th percentile.

    For an updated list, see


This work was supported by the Cedefop agency as part of the project “Real-time Labor Market information on skill requirements: feasibility study and working prototype”. Cedefop Reference number AO/RPA/VKVET-NSOFRO/Real-time LMI/010/14. Contract notice 2014/S 141-252026 of 15/07/2014.

Correspondence to Fabio Mercorio or Gabriella Pasi.

Boselli, R., Cesarini, M., Marrara, S. et al. WoLMIS: a labor market intelligence system for classifying web job vacancies. J Intell Inf Syst 51, 477–502 (2018).

