Data Scientist: A Systematic Review of the Literature

  • Marcos Antonio Espinoza MinaEmail author
  • Doris Del Pilar Gallegos Barzola
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 895)


The commercial activities of services and production have accumulated plenty of data throughout the years, hence today’s necessity of a professional agent to interpret data, generates information in order to produce valuable results and conclusions. The scope of the current article is to present a systematic review of the literature which main goal was to spot the work and career profile of the so called Data Scientist; realizing that, as a new work field, there are not concretely defined profiles, although knowledge areas are indeed defined, as well as characteristics that are needed to be counted, apart from some technologies that can serve as supporting means for the labor these new technicians do in the IT (Information Technology) area.


Data scientist Work profile Career profile 


  1. 1.
    Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)CrossRefGoogle Scholar
  2. 2.
    Jaramillo, O.: Pertinencia del perfil de los profesionales de la información con las demandas del mercado laboral. Revista Interamericana de Bibliotecología. 38 (2015).
  3. 3.
    Kim, M., Zimmermann, T., DeLine, R., Begel, A.: The emerging role of data scientists on software development teams, pp. 96–107. ACM Press (2016).
  4. 4.
    Ecleo, J.J., Galido, A.: Surveying LinkedIn profiles of data scientists: the case of the Philippines. Procedia Comput. Sci. 124, 53–60 (2017). Scholar
  5. 5.
    Kitchenham, B.: Procedures for performing systematic reviews. 33 (2004)Google Scholar
  6. 6.
    Huang, X., Lin, J.: Evaluation of PICO as a knowledge representation for clinical questions: In: Proceeding of the Annual Symposium oh the American Medical Informatics Association. AMIA Press (2006).
  7. 7.
    Zhai, J., Jocz, J.A., Tan, A.-L.: ‘Am I Like a Scientist?’: primary children’s images of doing science in school. Int. J. Sci. Educ. 36, 553–576 (2014). Scholar
  8. 8.
    Treadwell, G., Ross, T., Lee, A., Lowenstein, J.K.: A numbers game: two case studies in teaching data journalism. Journal. Mass Commun. Educ. 71, 297–308 (2016). Scholar
  9. 9.
    Younge, A.J.: Architectural principles and experimentation of distributed high performance virtual clusters. 24 (2017)Google Scholar
  10. 10.
    Gold, A.U., et al.: Arctic climate connections curriculum: a model for bringing authentic data into the classroom. J. Geosci. Educ. 63, 185–197 (2015). Scholar
  11. 11.
    Fuller, M.: BIG DATA: new science, new challenges, new dialogical opportunities: Zygon. Zygon® 50, 569–582 (2015). Scholar
  12. 12.
    Manieri, A., et al.: Data science professional uncovered: how the EDISON project will contribute to a widely accepted profile for Data Scientists (2015)Google Scholar
  13. 13.
    Seo, D., Lee, M.-H., Yu, S.: Development of network analysis and visualization system for KEGG pathways. Symmetry 7, 1275–1288 (2015). Scholar
  14. 14.
    Shaikh, M.A.H., Omar, M.T., Azharul Hasan, K.M.: Efficient index computation for array based structured data. In: Efficient Index Computation for Array Based Structured Data, pp. 101–105. IEEE (2015). Accessed 18 May 2018
  15. 15.
    Rupp, A.A., van Rijn, P.W.: GDINA and CDM packages in R. Meas.: Interdiscipl. Res. Perspect. 16, 71–77 (2018). Scholar
  16. 16.
    Webb, S.J., et al.: Guidelines and best practices for electrophysiological data collection, analysis and reporting in autism. J. Autism Dev. Disord. 45, 425–443 (2015). Scholar
  17. 17.
    Brennan, P.F., Bakken, S.: Nursing needs big data and big data needs nursing: nursing needs big data. J. Nurs. Scholarsh. 47, 477–484 (2015). Scholar
  18. 18.
    Tudoran, R., Costan, A., Antoniu, G.: OverFlow: multi-site aware big data management for scientific workflows on clouds. IEEE Trans. Cloud Comput. 4, 76–89 (2016). Scholar
  19. 19.
    Asamoah, D.A., Sharda, R., Hassan Zadeh, A., Kalgotra, P.: Preparing a data scientist: a pedagogic experience in designing a big data analytics course: preparing a data scientist. Decis. Sci. J. Innov. Educ. 15, 161–190 (2017). Scholar
  20. 20.
    Bowers, A.J.: Quantitative research methods training in education leadership and administration preparation programs as disciplined inquiry for building school improvement capacity. J. Res. Leadersh. Educ. 12, 72–96 (2017). Scholar
  21. 21.
    Malviya, A., Udhani, A., Soni, S.: R-tool: data analytic framework for big data. In: R-Tool: Data Analytic Framework for Big Data, pp. 1–5. IEEE (2016). Accessed 18 May 2018
  22. 22.
    Ebadi, H., Antignac, T., Sands, D.: Sampling and partitioning for differential privacy. In: Sampling and Partitioning for Differential Privacy, pp. 664–673. IEEE (2016). Accessed 18 May 2018
  23. 23.
    Rojas, J.A.R., Beth Kery, M., Rosenthal, S., Dey, A.: Sampling techniques to improve big data exploration. Sampling Techniques to Improve Big Data Exploration, pp. 26–35. IEEE (2017). Accessed 18 May 2018
  24. 24.
    Gehl, R.W.: Sharing, knowledge management and big data: a partial genealogy of the data scientist (2015)CrossRefGoogle Scholar
  25. 25.
    Kim, S., Choi, M.-S.: Study on data center and data librarian role for reuse of research data. In: Study on Data Center and Data Librarian Role for Reuse of Research Data, pp. 303–308. IEEE (2016). Accessed 18 May 2018
  26. 26.
    Eybers, S., Hattingh, M.: Teaching data science to post graduate students: a preliminary study using a « F-L-I-P » class room approach (2016)Google Scholar
  27. 27.
    Baškarada, S., Koronios, A.: Unicorn data scientist: the rarest of breeds. Program 51, 65–74 (2017). Scholar
  28. 28.
    Schreck, B., Veeramachaneni, K.: What would a data scientist ask? Automatically formulating and solving predictive problems. In: What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems, pp. 440–451. IEEE (2016). Accessed 19 May 2018
  29. 29.
    Data robot: Beneficios para los científicos de datos. Accessed 19 May 2018
  30. 30.
    SubjectivesSystems: Convertimos DATA en VENTAJA. Accessed 19 May 2018
  31. 31.
    Turi create intelligence: GraphLab-Create. Accessed 19 May 2018
  32. 32.
    Ipython: Ipython interactive computing. Accessed 19 May 2018
  33. 33.
    KNIME: KNIME Analytics Platform. Accessed 19 May 2018
  34. 34.
    Saltz, J.S., Grady, N.W.: The ambiguity of data science team roles and the need for a data science workforce framework, pp. 2355–2361. IEEE (2017). Accessed 19 May 2018
  35. 35.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Marcos Antonio Espinoza Mina
    • 1
    • 2
    Email author
  • Doris Del Pilar Gallegos Barzola
    • 3
  1. 1.Universidad EcotecSamborondónEcuador
  2. 2.Universidad Agraria del EcuadorGuayaquilEcuador
  3. 3.MADO S.A.GuayaquilEcuador

Personalised recommendations