Abstract
The purpose of this paper is to compare the strategies of companies with data science practices and methodologies and the data specificities/variables that can influence the definition of a data science strategy in pharma companies. The current paper is an empirical study, and the research approach consists of verifying against a set of statistical tests the differences between companies with a data science strategy and companies without a data science strategy. We have designed a specific questionnaire and applied it to a sample of 280 pharma companies. The main findings are based on the analysis of these variables: overwhelming volume, managing unstructured data, data quality, availability of data, access rights to data, data ownership issues, cost of data, lack of pre-processing facilities, lack of technology, shortage of talent/skills, privacy concerns and regulatory risks, security, and difficulties of data portability regarding companies with a data science strategy and companies without a data science strategy. The paper offers an in-depth comparative analysis between companies with or without a data science strategy, and the key limitation is regarding the literature review as a consequence of the novelty of the theme; there is a lack of scientific studies regarding this specific aspect of data science. In terms of the practical business implications, an organization with a data science strategy will have better direction and management practices as the decision-making process is based on accurate and valuable data, but it needs data scientists skills to fulfil those goals.
Similar content being viewed by others
References
Adam NR, Wieder R, Ghosh D (2017) Data science, learning, and applications to biomedical and health sciences. Ann N Y Acad Sci 1387(1):5–11
Akerkar R, Sajja PS (2016) Intelligent techniques for data science, 1st edn. Springer, Switzerland
Anderson JC, Gerbing DW (1988) Structural equation modeling in practice: a review and recommended two-step approach. Psychol Bull 103(3):411
Blanca MJ, Alarcón R, Arnau J, Bono R, Bendayan R (2017) Non-normal data: is ANOVA still a valid option? Psicothema 29(4):552–557
Brownson RC, Colditz GA, Proctor EK (2017) Dissemination and implementation research in health: translating science to practice. Oxford University Press, Oxford
Cao L (2017) Data science: a comprehensive overview. ACM Comput Surv (CSUR) 50(3):43
Cao L (2017) Data science: challenges and directions. Commun ACM 60(8):59–68
Cao L (2016) Data science: nature and pitfalls. IEEE Intell Syst 31(5):66–75
Cleveland WS (2001) Data science: an action plan for expanding the technical areas of the field of statistics. Int Stat Rev 69(1):21–26
Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16(3):297–334
Cruz-Correia R, Ferreira D, Bacelar G et al (2018) Personalised medicine challenges: quality of data. Int J Data Sci Anal 6:251. https://doi.org/10.1007/s41060-018-0127-9
Dinov ID (2019) Quant data science meets dexterous artistry. Int J Data Sci Anal 7:81
Dinov ID (2016) Volume and value of big healthcare data. J Med Stat Inf 4(1):1–7
Fornell C, Larcker DF (1981) Structural equation models with unobservable variables and measurement error. J Mark Res 18(1):39–50
Hair JF, Black WC, Babin BJ, Anderson RE (2010) Multivariate Data Analysis. Seventh Edition. Prentice Hall, Upper Saddle River, New Jersey
Hayashi C (1998) What is data science? Fundamental concepts and a heuristic example. In: Data science, classification, and related methods 1998. Springer, Tokyo, pp 40–51
Jain S (2017) Bridging the Gap Between R&D and commercialization in the pharmaceutical industry: role of medical affairs and medical communications. Int J Biomed Sci 3(3):44–49
Malley B, Ramazzotti D, Wu JT (2016) Data Pre-processing. In: Secondary Analysis of Electronic Health Records. Springer, Cham. Available from: https://www.ncbi.nlm.nih.gov/books/NBK543629/, https://doi.org/10.1007/978-3-319-43742-2_12
Mercadé-Melé P, Molinillo S, Fernández-Morales A (2017) The influence of the types of media on the formation of perceived CSR. Span J Market-ESIC 21:54–64
Mercadé-Melé P, Molinillo S, Fernández-Morales A, Porcu L (2018) CSR activities and consumer loyalty: the effect of the type of publicizing medium. J Bus Econ Manag 19(3):431–455
Radermacher WJ (2018) Official statistics in the era of big data opportunities and threats. Int J Data Sci Anal 6:225. https://doi.org/10.1007/s41060-018-0124-z
Rheinheimer DC, Penfield DA (2001) The effects of type I error rate and power of the ANCOVA F test and selected alternatives under nonnormality and variance heterogeneity. J Exp Educ 69(4):373–391
Salas J, Domingo-Ferrer J (2018) Some Basics on privacy techniques, anonymization, and their big data challenges. J Math Comput Sci 12:263. https://doi.org/10.1007/s11786-018-0344-6
Steinwandter V, Borchert D, Herwig C (2019) Data science tools and applications on the way to Pharma 4.0. Drug Discov Today. 24(9):1795–1805
Satorra A, Bentler PM (1988) Scaling corrections for chi-square statistics in covariance structure analysis. In: Proceedings of the American Statistical Association
Satorra A, Bentler PM (1994) Corrections to test statistics and standard errors in covariance structure analysis. In: von Eye A, Clogg CC (eds) Latent variables analysis
Schneeweiss S (2014) Learning from big health care data. N Engl J Med 370(23):2161–2163
Skiena SS (2017) The data science design manual. Springer, Cham
Tariq MI, Memon NA, Ahmed S, Tayyaba S, Mushtaq MT, Mian NA, Imran M, Ashraf MW (2020) A Review of Deep Learning Security and Privacy Defensive Techniques. Mobile Information Systems. https://doi.org/10.1155/2020/6535834
Tabachnick BG, Fidell LS, Ullman JB (2007) Using multivariate statistics, vol 5. Pearson, Boston
Torra V, Navarro-Arribas G (2016) Big data privacy, and anonymization. In: Lehmann A, Whitehouse D, Fischer-Hübner S, Fritsch L, Raab C (eds) Privacy and identity management. facing up to next steps. Privacy and Identity, 2016. IFIP Advances in Information and Communication Technology, vol 498. Springer, Cham
Wenwu H, Guomai L (2017) Exploration and research on the core course construction of data science and big data technology specialty, education review (2017)
Wilkinson MD et al (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3:160018–1–160018–9
Wohlfarth M (2019) Data portability on the internet. Bus Inf Syst Eng 61:551
Funding
This study was funded by Fundação para a Ciência e Tecnologia, Grant: UIDB/00315/2020.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sousa, M.J., Melé, P.M., Pesqueira, A.M. et al. Data science strategies leading to the development of data scientists’ skills in organizations. Neural Comput & Applic 33, 14523–14531 (2021). https://doi.org/10.1007/s00521-021-06095-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06095-3