Data Science and Big Data Analytics at Career Builder



In the online job recruitment domain, matching job seekers with relevant jobs is critical for closing the skills gap. When dealing with millions of resumes and job postings, such matching analytics involve several Big Data challenges. At CareerBuilder, we tackle these challenges by (i) classifying large datasets of job ads and job seeker resumes to occupation categories and (ii) providing a scalable framework that facilitates executing web services for Big Data applications.

In this chapter, we discuss two systems currently in production at CareerBuilder that facilitate our goal of closing the skills gap. These systems also power several downstream applications and labor market analytics products. We first discuss Carotene, a large-scale, machine learning-based semi-supervised job title classification system. Carotene has a coarse and fine-grained cascade architecture and a clustering based job title taxonomy discovery component that facilitates discovering more fine-grained job titles than the ones in the industry standard occupation taxonomy. We then describe CARBi, a system for developing and deploying Big Data applications for understanding and improving job-resume dynamics. CARBi consists of two components: (i) WebScalding, a library that provides quick access to commonly used datasets, database tables, data formats, web services, and helper functions to access and transform data, and (ii) ScriptDB, a standalone application that helps developers execute and manage Big Data projects. The system is built in such a way that every job developed using CARBi can be executed in local and cluster modes.


Relevance Vector Machine Hierarchical Classifier Cascade Classifier Script File Standard Occupational Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Dumais S, Chen H (2000) Hierarchical classification of web content. In: Proceedings of ACM SIGIR’00. New York, USA, pp 256–263Google Scholar
  2. 2.
    Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of ICML 1999. San Francisco, USA, pp 200–209Google Scholar
  3. 3.
    Liu M, Lu L, Ye X et al (2011) Coarse-to-fine classification via parametric and nonparametric models for computer-aided diagnosis. In: Proceedings of ACM CIKM’11. New York, USA, pp 2509–2512Google Scholar
  4. 4.
    Shen D, Ruvini J-D, Sarwar B (2012) Large-scale item categorization for e-commerce. In: Proceedings of ACM CIKM’12, pp 595–604Google Scholar
  5. 5.
    Bekkerman R, Gavish M (2011) High-precision phrase-based document classification on a modern scale. In: Proceedings of the 17th ACM SIGKDD-KDD’11, pp 231–239Google Scholar
  6. 6.
    Babbar R, Partalas I (2013) On flat versus hierarchical classification in large-scale taxonomies. In: Proceedings of the neural information processing systems (NIPS), pp 1–9Google Scholar
  7. 7.
    Osiński S, Weiss D (2005) A concept-driven algorithm for clustering search results. IEEE Intell Syst 3(20):48–54CrossRefGoogle Scholar
  8. 8.
    Fan RE et al (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874zbMATHGoogle Scholar
  9. 9.
    Nathan P (2013) Enterprise data workflows with cascading, 1st edn. O’ReillyMedia, Sebastopol. Sebastopol, USAGoogle Scholar
  10. 10.
    Hall M, Frank E, Holmes G et al (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
  11. 11.
    Yu H-F, Ho C-H, Juan Y-C et al (2013) LibShortText: a library for short-text classification and analysis. Department of Computer Science, National Taiwan University, Taipei, Technical report.
  12. 12.
    Jacob F, Javed F, Zhao M et al (2014) sCooL: a system for academic institution name normalization. In: 2014 international conference on collaboration technologies and systems, CTS. Minneapolis, USA, pp 86–93Google Scholar
  13. 13.
    Zhao M, Javed F, Jacob F et al (2015) SKILL: a system for skill identification and normalization. AAAI 2015. Austin, USA, pp 4012–4018Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Data Science R&D, CareerBuilderNorcrossUSA

Personalised recommendations