Similarity Computation Exploiting the Semantic and Syntactic Inherent Structure Among Job Titles

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10601)


Solutions providing hiring analytics involve mapping company provided job descriptions to a standard job framework, thereby requiring computation of a similarity score between two jobs. Most systems doing so apply document similarity computation methods to all pairs of provided job descriptions. This approach can be computationally expensive and adversely impacted by the quality of the job descriptions which often include information not relevant to the job or candidate qualifications. We propose a method to narrow down pairs of job descriptions to be compared by comparing job titles first. The observation that each job title can be decomposed into three components, domain, function and attribute, forms the basis of our method. Our proposal focuses on training the machine learning models to identify these three components of any given job title. Next we do a semantic match between the three identified components, and use those match scores to create a composite similarity score between any two pair of job titles. The elegance of this solution lies in the fact that job titles are the most concise definition of the job and the resulting matches can easily be verified by human experts. Our results show that the approach provides extremely reliable results.


  1. 1.
  2. 2.
    Aizawa, A.: An information-theoretic perspective of Tf-idf measures. Inf. Process. Manag. 39, 45–65 (2003)CrossRefzbMATHGoogle Scholar
  3. 3.
    Bekkerman, R., Gavish, M.: High-precision phrase-based document classification on a modern scale. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2011)Google Scholar
  4. 4.
    Javed, F., Luo, Q., McNair, M., Jacob, F., Zhao, M., Kang, T.S.: Carotene: a job title classification system for the online recruitment domain. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications (BigDataService), pp. 286–293. IEEE (2015)Google Scholar
  5. 5.
    Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, Association for Computational Linguistics, Stroudsburg, PA, USA, vol. 1, pp. 63–70 (2002).
  6. 6.
    Miller, G.A.: WordNet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  7. 7.
    Nakov, P.I., Hearst, M.A.: Semantic interpretation of noun compounds using verbal and other paraphrases. ACM Trans. Speech Lang. Process. (TSLP) 10(3), 13 (2013)Google Scholar
  8. 8.
    Ó Séaghdha, D.: Learning compound noun semantics. Technical report, University of Cambridge, Computer Laboratory (2008)Google Scholar
  9. 9.
    Riloff, E., Lehnert, W.: Information extraction as a basis for high-precision text classification. ACM Trans. Inf. Syst. (TOIS) 12(3), 296–333 (1994)CrossRefGoogle Scholar
  10. 10.
    Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, pp. 377–386. ACM (2006)Google Scholar
  11. 11.
    Zhu, Y., Javed, F., Ozturk, O.: Semantic similarity strategies for job title classification. arXiv preprint arXiv:1609.06268 (2016)

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.IBM Research LabNew DelhiIndia
  2. 2.IBM Talent Management SolutionsPortsmouthUK

Personalised recommendations