Data Science and Big Data Analytics at Career Builder

Javed, Faizan; Jacob, Ferosh

doi:10.1007/978-3-319-25313-8_6

Data Science and Big Data Analytics at Career Builder

Faizan Javed⁶ &
Ferosh Jacob⁶

Chapter
First Online: 13 January 2016

3109 Accesses
1 Citations

Abstract

In the online job recruitment domain, matching job seekers with relevant jobs is critical for closing the skills gap. When dealing with millions of resumes and job postings, such matching analytics involve several Big Data challenges. At CareerBuilder, we tackle these challenges by (i) classifying large datasets of job ads and job seeker resumes to occupation categories and (ii) providing a scalable framework that facilitates executing web services for Big Data applications.

In this chapter, we discuss two systems currently in production at CareerBuilder that facilitate our goal of closing the skills gap. These systems also power several downstream applications and labor market analytics products. We first discuss Carotene, a large-scale, machine learning-based semi-supervised job title classification system. Carotene has a coarse and fine-grained cascade architecture and a clustering based job title taxonomy discovery component that facilitates discovering more fine-grained job titles than the ones in the industry standard occupation taxonomy. We then describe CARBi, a system for developing and deploying Big Data applications for understanding and improving job-resume dynamics. CARBi consists of two components: (i) WebScalding, a library that provides quick access to commonly used datasets, database tables, data formats, web services, and helper functions to access and transform data, and (ii) ScriptDB, a standalone application that helps developers execute and manage Big Data projects. The system is built in such a way that every job developed using CARBi can be executed in local and cluster modes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.onetsocautocoder.com/plus/onetmatch?action=guide
2.
http://lucene.apache.org
3.
https://github.com/twitter/scalding
4.
Freebase, http://www.freebase.com
5.
TypedPipe, http://twitter.github.io/scalding/index.html\#com.twitter.scalding.typed.TypedPipe
6.
Recruitment Edge, http://edge.careerbuilder.com
7.
Stringtemplate, http://www.stringtemplate.org

References

Dumais S, Chen H (2000) Hierarchical classification of web content. In: Proceedings of ACM SIGIR’00. New York, USA, pp 256–263
Google Scholar
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of ICML 1999. San Francisco, USA, pp 200–209
Google Scholar
Liu M, Lu L, Ye X et al (2011) Coarse-to-fine classification via parametric and nonparametric models for computer-aided diagnosis. In: Proceedings of ACM CIKM’11. New York, USA, pp 2509–2512
Google Scholar
Shen D, Ruvini J-D, Sarwar B (2012) Large-scale item categorization for e-commerce. In: Proceedings of ACM CIKM’12, pp 595–604
Google Scholar
Bekkerman R, Gavish M (2011) High-precision phrase-based document classification on a modern scale. In: Proceedings of the 17th ACM SIGKDD-KDD’11, pp 231–239
Google Scholar
Babbar R, Partalas I (2013) On flat versus hierarchical classification in large-scale taxonomies. In: Proceedings of the neural information processing systems (NIPS), pp 1–9
Google Scholar
Osiński S, Weiss D (2005) A concept-driven algorithm for clustering search results. IEEE Intell Syst 3(20):48–54
Article Google Scholar
Fan RE et al (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Nathan P (2013) Enterprise data workflows with cascading, 1st edn. O’ReillyMedia, Sebastopol. Sebastopol, USA
Google Scholar
Hall M, Frank E, Holmes G et al (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Yu H-F, Ho C-H, Juan Y-C et al (2013) LibShortText: a library for short-text classification and analysis. Department of Computer Science, National Taiwan University, Taipei, Technical report. http://www.csie.ntu.edu.tw/ncjlin/papers/libshorttext.pdf
Jacob F, Javed F, Zhao M et al (2014) sCooL: a system for academic institution name normalization. In: 2014 international conference on collaboration technologies and systems, CTS. Minneapolis, USA, pp 86–93
Google Scholar
Zhao M, Javed F, Jacob F et al (2015) SKILL: a system for skill identification and normalization. AAAI 2015. Austin, USA, pp 4012–4018
Google Scholar

Download references

Author information

Authors and Affiliations

Data Science R&D, CareerBuilder, Norcross, GA, USA
Faizan Javed & Ferosh Jacob

Authors

Faizan Javed
View author publications
You can also search for this author in PubMed Google Scholar
Ferosh Jacob
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Faizan Javed .

Editor information

Editors and Affiliations

University of Derby, Derby, United Kingdom
Marcello Trovati
University of Derby, Derby, United Kingdom
Richard Hill
University of Derby, Derby, United Kingdom
Ashiq Anjum
University of Derby, Derby, United Kingdom
Shao Ying Zhu
University of Derby, Derby, United Kingdom
Lu Liu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Javed, F., Jacob, F. (2015). Data Science and Big Data Analytics at Career Builder. In: Trovati, M., Hill, R., Anjum, A., Zhu, S., Liu, L. (eds) Big-Data Analytics and Cloud Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-25313-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-25313-8_6
Published: 13 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25311-4
Online ISBN: 978-3-319-25313-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics