Skip to main content

A Survey of Data Scientists in South Africa

  • Conference paper
  • First Online:
ICT Education (SACLA 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 730))

Abstract

Academic programmes at South African Higher Education Institutions have predominantly educated students in managing and storing data using relational database technology. However, this is no longer sufficient. South Africa as a country will need to educate more students to manage and process structured, semi-structured and unstructured data. The main purpose of this study was to examine the status of data scientists, a role typically associated with managing these new data sets, in South Africa. The study examined the skills, knowledge and qualifications these data scientists require to do their daily tasks, and offers suggestions that ought to be considered when designing a curriculum for an academic programme in data science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Example answers included: “transforming and shifting data”, “processing of large volumes of data (ETL/data pipeline)”, “data preparation for statistical models”, as well as “data warehousing, reporting, ETL development”.

References

  1. ACM and IEEE 2013: Computer Science Curricula 2013: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science. ACM (2013). https://doi.org/10.1145/2534860

  2. Anderson, P., Bowring, J., McCauley, R., Pothering, G., Starr, C.: An undergraduate degree in data science: curriculum and a decade of implementation experience. In: Proceedings of the 45th ACM Technical Symposium on Computer Science Education, SIGCSE 2014, pp. 145–150 (2014)

    Google Scholar 

  3. Berman, J.J.: Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information. Elsevier/Morgan Kaufman, Amsterdam/Burlington (2013)

    Google Scholar 

  4. Cattell, R.: Scalable SQL and NoSQL data stores. ACM SIGMOD Rec. 39(4), 12 (2011)

    Article  Google Scholar 

  5. Chen, H., Chiang, R., Storey, V.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)

    Google Scholar 

  6. College of Charleston: Data Science Program Information (2017)

    Google Scholar 

  7. Daniel, B., Butson, R.: Foundations of big data and analytics in higher education. In: Proceedings of the International Conference on Analytics Driven Solutions, ICAS 2014 (2014)

    Google Scholar 

  8. Davenport, T.H., Barth, P., Bean, R.: How big data is different. MIT Sloan Manag. Rev. 54, 22–24 (2012)

    Google Scholar 

  9. Davenport, T.H., Patil, D.J.: Data scientist: the sexiest job of the 21st century (2012). http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century. Accessed 25 Nov 2013

  10. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107 (2008)

    Article  Google Scholar 

  11. de Veaux, R.D., Agarwal, M., Averett, M., Baumer, B.S., Bray, A., Bressoud, T.C., Bryant, L., Cheng, L.Z., Francis, A., Gould, R., Kim, A.Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R.J., Sondjaja, M., Tiruviluamala, N., Uhlig, P.X., Washington, T.M., Wesley, C.L., White, D., Ye, P.: Curriculum guidelines for undergraduate programs in data science. Ann. Rev. Stat. Appl. 4(1), 15–30 (2017)

    Article  Google Scholar 

  12. Dhar, V.: Data science and prediction. Commun. ACM 56(12), 64–73 (2013)

    Article  Google Scholar 

  13. Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east executive summary: a universe of opportunities and challenges. Technical report, EMC (2012)

    Google Scholar 

  14. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 29–43 (2003)

    Google Scholar 

  15. Gittlen, S.: Could data scientist be your next job? Technical report, Computerworld (2012)

    Google Scholar 

  16. Gopalkrishnan, V., Steier, D.: Big data, big business: bridging the gap. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining, BigMine 2012, pp. 7–11 (2012)

    Google Scholar 

  17. Granville, V.: Developing Analytic Talent: Becoming a Data Scientist. Wiley, Hoboken (2014)

    Google Scholar 

  18. Harris, J.G., Shetterley, N., Alter, A.E., Schnell, K.: The team solution to the data scientist shortage. Technical report, Accenture Institute for High Performance (2013)

    Google Scholar 

  19. Holtz, D.: 8 skills you need to be a data scientist. Technical report, Udacity (2014)

    Google Scholar 

  20. Howard, J.H., Kazar, M.L., Menees, S.G., Nichols, D.A., Satyanarayanan, M., Sidebotham, R.N., West, M.J.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988)

    Article  Google Scholar 

  21. Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014)

    Article  Google Scholar 

  22. IBM: What is big data? Technical report (2015)

    Google Scholar 

  23. ITWeb: Business intelligence survey 2013 results. Technical report (2013)

    Google Scholar 

  24. Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)

    Article  Google Scholar 

  25. Jukić, N., Sharma, A., Nestorov, S., Jukić, B.: Augmenting data warehouses with big data. Inf. Syst. Manag. 32, 200–209 (2015)

    Article  Google Scholar 

  26. Kim, B.G., Trimi, S., Chung, J.H.: Big-data applications in the government sector. Commun. ACM 57(3), 78–85 (2014)

    Article  Google Scholar 

  27. Kim, W., Jeong, O.R., Kim, C.: A holistic view of big data. Int. J. Data Warehouse. Min. 10(3), 59–69 (2014)

    Article  Google Scholar 

  28. Kotzé, E.: Augmenting a data warehousing curriculum with emerging big data technologies. In: Liebenberg, J., Gruner, S. (eds.) SACLA 2017. CCIS, vol. 730, pp. 128–143. Springer, Cham (2017)

    Google Scholar 

  29. Kotzé, E.: An overview of big data and data science education at South African universities. Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie, 35(1) (2016). https://doi.org/10.4102/satnt.v35i1.1387

  30. Krishnan, K.: Data Warehousing in the Age of Big Data. Elsevier/Morgan Kaufman, Amsterdam/Burlington (2013)

    Google Scholar 

  31. Lopez, J.A.: Best practices for turning big data into big insights. Bus. Intell. J. 4(17), 17–21 (2012)

    Google Scholar 

  32. Lutu, P.: Big data and NoSQL databases: new opportunities for database systems curricula. In: Proceedings of the 44th Annual Southern African Computer Lecturers’ Association, SACLA’2015, pp. 204–209, Johannesburg (2015)

    Google Scholar 

  33. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity. Technical report, McKinsey (2011)

    Google Scholar 

  34. Marshall, L., Eloff, J.H.P.: Towards an interdisciplinary master’s degree programme in big data and data science: a South African perspective. In: CCIS, vol. 642, pp. 131–139 (2016)

    Google Scholar 

  35. Mills, R.J., Chudoba, K.M., Olsen, D.H.: IS programs responding to industry demands for data scientists: a comparison between 2011 and 2016. J. Inf. Syst. Educ. 27(2), 131–141 (2016)

    Google Scholar 

  36. Minelli, M., Chambers, M., Dhiraj, A.: Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. Wiley, Hoboken (2013)

    Book  Google Scholar 

  37. Moyo, A.: South Africa snubs big data. Technical report, iWeek (2014)

    Google Scholar 

  38. Normandeau, K.: Beyond volume, variety and velocity is the issue of big data veracity. Technical report (2013)

    Google Scholar 

  39. North-West University: BMI (2016). http://natural-sciences.nwu.ac.za/bmi

  40. NVivo: Qualitative Data Analysis Software (Version 11). QSR International (2016)

    Google Scholar 

  41. Patil, D.J.: Building Data Science Teams. O’Reilly, Sebastopol (2011)

    Google Scholar 

  42. Pieterse, I.: How big data is changing business. Technical report, iWeek (2014)

    Google Scholar 

  43. Rouse, M.: Data scientist. Technical report, Search Business Analytics (2011)

    Google Scholar 

  44. SAQA: http://www.saqa.org.za/

  45. Sol Plaatjie University: Bachelor of Science in Data Science. Technical report (2016)

    Google Scholar 

  46. Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen-Sarma, J., Murthy, R., Liu, H.: Artefact data warehousing and analytics infrastructure at Facebook. In: Proceedings of the SIGMOD Conference, pp. 1013–1020. ACM (2010)

    Google Scholar 

  47. University of Pretoria: Master’s Degree in Big Data Science at the University of Pretoria. Technical report (2016)

    Google Scholar 

  48. van Biljon, A., Kotzé, E.: How big is big data and where will you find it? Technical report, EngineerIT (2015)

    Google Scholar 

  49. van der Aalst, W.M.P.: Data scientist: the engineer of the future. In: Mertins, K., Bénaben, F., Poler, R., Bourrières, J.-P. (eds.) Enterprise Interoperability VI. PIC, vol. 7, pp. 13–26. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04948-9_2

    Chapter  Google Scholar 

  50. Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN. In: Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC 2013, pp. 1–16. ACM (2013)

    Google Scholar 

  51. Wamba, S.F., Akter, S., Edwards, A., Chopin, G., Gnanzou, D.: How “big data” can make big impact: findings from a systematic review and a longitudinal case study. Int. J. Prod. Econ. 165, 234–246 (2015). https://doi.org/10.1016/j.ijpe.2014.12.031

  52. Watson, H.J., Marjanovic, O.: Big data: the fourth data management generation. Bus. Intell. J. 18(3), 4–9 (2014)

    Google Scholar 

  53. White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly, Sebastopol (2015)

    Google Scholar 

  54. Yin, S., Kaynak, O.: Big data for modern industry: challenges and trends. IEEE 103(2), 143–146 (2015)

    Article  Google Scholar 

  55. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (2010)

    Google Scholar 

Download references

Acknowledgements

Thanks to all respondents who made this study possible. Thanks to Anelize van Biljon for her helpful comments on drafts of this paper. Last but not least thank the anonymous reviewers of SACLA’2017 for their valuable comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduan Kotzé .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kotzé, E. (2017). A Survey of Data Scientists in South Africa. In: Liebenberg, J., Gruner, S. (eds) ICT Education. SACLA 2017. Communications in Computer and Information Science, vol 730. Springer, Cham. https://doi.org/10.1007/978-3-319-69670-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69670-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69669-0

  • Online ISBN: 978-3-319-69670-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics