Skip to main content

A Survey of Data Scientists in South Africa

Part of the Communications in Computer and Information Science book series (CCIS,volume 730)

Abstract

Academic programmes at South African Higher Education Institutions have predominantly educated students in managing and storing data using relational database technology. However, this is no longer sufficient. South Africa as a country will need to educate more students to manage and process structured, semi-structured and unstructured data. The main purpose of this study was to examine the status of data scientists, a role typically associated with managing these new data sets, in South Africa. The study examined the skills, knowledge and qualifications these data scientists require to do their daily tasks, and offers suggestions that ought to be considered when designing a curriculum for an academic programme in data science.

Keywords

  • Big data
  • Data science
  • Curriculum
  • Professional skills

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-69670-6_12
  • Chapter length: 17 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-69670-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)

Notes

  1. 1.

    Example answers included: “transforming and shifting data”, “processing of large volumes of data (ETL/data pipeline)”, “data preparation for statistical models”, as well as “data warehousing, reporting, ETL development”.

References

  1. ACM and IEEE 2013: Computer Science Curricula 2013: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science. ACM (2013). https://doi.org/10.1145/2534860

  2. Anderson, P., Bowring, J., McCauley, R., Pothering, G., Starr, C.: An undergraduate degree in data science: curriculum and a decade of implementation experience. In: Proceedings of the 45th ACM Technical Symposium on Computer Science Education, SIGCSE 2014, pp. 145–150 (2014)

    Google Scholar 

  3. Berman, J.J.: Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information. Elsevier/Morgan Kaufman, Amsterdam/Burlington (2013)

    Google Scholar 

  4. Cattell, R.: Scalable SQL and NoSQL data stores. ACM SIGMOD Rec. 39(4), 12 (2011)

    CrossRef  Google Scholar 

  5. Chen, H., Chiang, R., Storey, V.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)

    Google Scholar 

  6. College of Charleston: Data Science Program Information (2017)

    Google Scholar 

  7. Daniel, B., Butson, R.: Foundations of big data and analytics in higher education. In: Proceedings of the International Conference on Analytics Driven Solutions, ICAS 2014 (2014)

    Google Scholar 

  8. Davenport, T.H., Barth, P., Bean, R.: How big data is different. MIT Sloan Manag. Rev. 54, 22–24 (2012)

    Google Scholar 

  9. Davenport, T.H., Patil, D.J.: Data scientist: the sexiest job of the 21st century (2012). http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century. Accessed 25 Nov 2013

  10. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107 (2008)

    CrossRef  Google Scholar 

  11. de Veaux, R.D., Agarwal, M., Averett, M., Baumer, B.S., Bray, A., Bressoud, T.C., Bryant, L., Cheng, L.Z., Francis, A., Gould, R., Kim, A.Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R.J., Sondjaja, M., Tiruviluamala, N., Uhlig, P.X., Washington, T.M., Wesley, C.L., White, D., Ye, P.: Curriculum guidelines for undergraduate programs in data science. Ann. Rev. Stat. Appl. 4(1), 15–30 (2017)

    CrossRef  Google Scholar 

  12. Dhar, V.: Data science and prediction. Commun. ACM 56(12), 64–73 (2013)

    CrossRef  Google Scholar 

  13. Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east executive summary: a universe of opportunities and challenges. Technical report, EMC (2012)

    Google Scholar 

  14. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 29–43 (2003)

    Google Scholar 

  15. Gittlen, S.: Could data scientist be your next job? Technical report, Computerworld (2012)

    Google Scholar 

  16. Gopalkrishnan, V., Steier, D.: Big data, big business: bridging the gap. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining, BigMine 2012, pp. 7–11 (2012)

    Google Scholar 

  17. Granville, V.: Developing Analytic Talent: Becoming a Data Scientist. Wiley, Hoboken (2014)

    Google Scholar 

  18. Harris, J.G., Shetterley, N., Alter, A.E., Schnell, K.: The team solution to the data scientist shortage. Technical report, Accenture Institute for High Performance (2013)

    Google Scholar 

  19. Holtz, D.: 8 skills you need to be a data scientist. Technical report, Udacity (2014)

    Google Scholar 

  20. Howard, J.H., Kazar, M.L., Menees, S.G., Nichols, D.A., Satyanarayanan, M., Sidebotham, R.N., West, M.J.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988)

    CrossRef  Google Scholar 

  21. Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014)

    CrossRef  Google Scholar 

  22. IBM: What is big data? Technical report (2015)

    Google Scholar 

  23. ITWeb: Business intelligence survey 2013 results. Technical report (2013)

    Google Scholar 

  24. Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)

    CrossRef  Google Scholar 

  25. Jukić, N., Sharma, A., Nestorov, S., Jukić, B.: Augmenting data warehouses with big data. Inf. Syst. Manag. 32, 200–209 (2015)

    CrossRef  Google Scholar 

  26. Kim, B.G., Trimi, S., Chung, J.H.: Big-data applications in the government sector. Commun. ACM 57(3), 78–85 (2014)

    CrossRef  Google Scholar 

  27. Kim, W., Jeong, O.R., Kim, C.: A holistic view of big data. Int. J. Data Warehouse. Min. 10(3), 59–69 (2014)

    CrossRef  Google Scholar 

  28. Kotzé, E.: Augmenting a data warehousing curriculum with emerging big data technologies. In: Liebenberg, J., Gruner, S. (eds.) SACLA 2017. CCIS, vol. 730, pp. 128–143. Springer, Cham (2017)

    Google Scholar 

  29. Kotzé, E.: An overview of big data and data science education at South African universities. Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie, 35(1) (2016). https://doi.org/10.4102/satnt.v35i1.1387

  30. Krishnan, K.: Data Warehousing in the Age of Big Data. Elsevier/Morgan Kaufman, Amsterdam/Burlington (2013)

    Google Scholar 

  31. Lopez, J.A.: Best practices for turning big data into big insights. Bus. Intell. J. 4(17), 17–21 (2012)

    Google Scholar 

  32. Lutu, P.: Big data and NoSQL databases: new opportunities for database systems curricula. In: Proceedings of the 44th Annual Southern African Computer Lecturers’ Association, SACLA’2015, pp. 204–209, Johannesburg (2015)

    Google Scholar 

  33. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity. Technical report, McKinsey (2011)

    Google Scholar 

  34. Marshall, L., Eloff, J.H.P.: Towards an interdisciplinary master’s degree programme in big data and data science: a South African perspective. In: CCIS, vol. 642, pp. 131–139 (2016)

    Google Scholar 

  35. Mills, R.J., Chudoba, K.M., Olsen, D.H.: IS programs responding to industry demands for data scientists: a comparison between 2011 and 2016. J. Inf. Syst. Educ. 27(2), 131–141 (2016)

    Google Scholar 

  36. Minelli, M., Chambers, M., Dhiraj, A.: Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. Wiley, Hoboken (2013)

    CrossRef  Google Scholar 

  37. Moyo, A.: South Africa snubs big data. Technical report, iWeek (2014)

    Google Scholar 

  38. Normandeau, K.: Beyond volume, variety and velocity is the issue of big data veracity. Technical report (2013)

    Google Scholar 

  39. North-West University: BMI (2016). http://natural-sciences.nwu.ac.za/bmi

  40. NVivo: Qualitative Data Analysis Software (Version 11). QSR International (2016)

    Google Scholar 

  41. Patil, D.J.: Building Data Science Teams. O’Reilly, Sebastopol (2011)

    Google Scholar 

  42. Pieterse, I.: How big data is changing business. Technical report, iWeek (2014)

    Google Scholar 

  43. Rouse, M.: Data scientist. Technical report, Search Business Analytics (2011)

    Google Scholar 

  44. SAQA: http://www.saqa.org.za/

  45. Sol Plaatjie University: Bachelor of Science in Data Science. Technical report (2016)

    Google Scholar 

  46. Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen-Sarma, J., Murthy, R., Liu, H.: Artefact data warehousing and analytics infrastructure at Facebook. In: Proceedings of the SIGMOD Conference, pp. 1013–1020. ACM (2010)

    Google Scholar 

  47. University of Pretoria: Master’s Degree in Big Data Science at the University of Pretoria. Technical report (2016)

    Google Scholar 

  48. van Biljon, A., Kotzé, E.: How big is big data and where will you find it? Technical report, EngineerIT (2015)

    Google Scholar 

  49. van der Aalst, W.M.P.: Data scientist: the engineer of the future. In: Mertins, K., Bénaben, F., Poler, R., Bourrières, J.-P. (eds.) Enterprise Interoperability VI. PIC, vol. 7, pp. 13–26. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04948-9_2

    CrossRef  Google Scholar 

  50. Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN. In: Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC 2013, pp. 1–16. ACM (2013)

    Google Scholar 

  51. Wamba, S.F., Akter, S., Edwards, A., Chopin, G., Gnanzou, D.: How “big data” can make big impact: findings from a systematic review and a longitudinal case study. Int. J. Prod. Econ. 165, 234–246 (2015). https://doi.org/10.1016/j.ijpe.2014.12.031

  52. Watson, H.J., Marjanovic, O.: Big data: the fourth data management generation. Bus. Intell. J. 18(3), 4–9 (2014)

    Google Scholar 

  53. White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly, Sebastopol (2015)

    Google Scholar 

  54. Yin, S., Kaynak, O.: Big data for modern industry: challenges and trends. IEEE 103(2), 143–146 (2015)

    CrossRef  Google Scholar 

  55. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (2010)

    Google Scholar 

Download references

Acknowledgements

Thanks to all respondents who made this study possible. Thanks to Anelize van Biljon for her helpful comments on drafts of this paper. Last but not least thank the anonymous reviewers of SACLA’2017 for their valuable comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduan Kotzé .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Kotzé, E. (2017). A Survey of Data Scientists in South Africa. In: Liebenberg, J., Gruner, S. (eds) ICT Education. SACLA 2017. Communications in Computer and Information Science, vol 730. Springer, Cham. https://doi.org/10.1007/978-3-319-69670-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69670-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69669-0

  • Online ISBN: 978-3-319-69670-6

  • eBook Packages: Computer ScienceComputer Science (R0)