Cloud Spark Cluster to Analyse English Prescription Big Data for NHS Intelligence

Fernando, Sandra; Mydlarz, Victor Sowinski; Katanani, Asya; Virdee, Bal

doi:10.1007/978-981-99-6544-1_27

Sandra Fernando¹³,
Victor Sowinski Mydlarz¹³,
Asya Katanani¹³ &
…
Bal Virdee¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 785))

Included in the following conference series:

International Conference on Data Analytics & Management

74 Accesses

Abstract

Spark is a large-scale data processing engine that is at least a hundred times faster than the Hadoop big data processing engine. Even though Spark is a complete in-memory framework, although limited with its big data platforms facilities compared to Hadoop, Spark analytics engine with Hadoop distributed file system gives better throughput than Hadoop alone. The main contribution of this paper is the insight into the behaviour of HDFS-based Azura Cloud Spark Cluster with discussion and evaluation of its strengths and limitations using NHS prescription large dataset. Data on NHS prescriptions obtained from 2015 to April 2022 exceeds 500 GB of records. A public dashboard for individual BNF code analysis and studies on NHS cost analysis exist, but no analysis of this data range and volume of NHS prescription and especially using new big data processing engines such as Spark was conducted. This study also contributes descriptive statistics and machine learning models of prescription data trends using Cloud Spark engine and PySpark technology that has not been used in this context before. This study illustrates regions as well as GP practices in terms of reimbursement cost, drug consumption level, the type of the drug, and the disease type; varied demand for dispensed chemical substances over the years; shows what diseases have increased or decreased over the years as well as the total cost and its trends.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Naser AY, Alwafi H, Al-Daghastani T, Hemmo SI, Alrawashdeh HM, Jalal Z, Paudyal V, Alyamani N, Almaghrabi M, Shamieh A (2022) Drugs utilization profile in England and Wales in the past 15 years: a secular trend analysis. BMC primary care 23(1):239. https://doi.org/10.1186/s12875-022-01853-1
Article Google Scholar
OpenPrescribing.net, Bennett Institute for Applied Data Science, University of Oxford, 2023, https://openprescribing.net/
Salloum S, Dautov R, Chen X et al (2016) Big data analytics on Apache Spark. Int J Data Sci Anal 1:145–164. https://doi.org/10.1007/s41060-016-0027-9
Article Google Scholar
Shaikh E, Mohiuddin I, Alufaisan Y, Nahvi I (2019) Apache Spark: a big data processing engine. In: 2019 2nd IEEE middle East and North Africa communications conference (MENACOMM), Manama, Bahrain, pp 1–6. https://doi.org/10.1109/MENACOMM46666.2019.8988541
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing (HotCloud'10). USENIX Association, USA, 10
Google Scholar
Lekha RN, Sujala DS, Siddhanth DS (2018) Applying spark based machine learning model on streaming big data for health status prediction. Comput Electric Eng 65:393–399, ISSN 0045-7906
Google Scholar
Bell J, GBE FF (2017) Life sciences industrial strategy—a report to the government from the life sciences sector. Office for Life Sciences
Google Scholar
Kyoungyoung J, Gang-Hoon K (2013) Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. The Korean Society of Medical Informatics, 79–85
Google Scholar
Villars RL, Olofson CW, Eastwood M (2011) Big data: what it is and why you should care. IDC Analyze the Future, 4
Google Scholar
Dash S, Shakyawar SK, Sharma M, Kaushik S (2019) Big data in healthcare: management, analysis and future prospects. J Big Data 54
Google Scholar
Kretz A (2019) The data engineering cookbook: mastering the plumbing of data science v3
Google Scholar
Wang G, Xin R, Damji J (2018) Benchmarking Apache Spark on a Single node machine, engineering Blog https://www.databricks.com/blog/2018/05/03/benchmarking-apache-spark-on-a-single-node-machine.html
Microsoft (2023) Best practices: cluster configuration, Azure Databricks documentation, https://learn.microsoft.com/en-us/azure/databricks/clusters/cluster-config-best-practices
Learning Journal (2021) Parallel processing in Apache Spark, Apache Spark core context, https://www.learningjournal.guru/article/apache-spark
MacDonald BK, Cockerell OC, Sander JW, Shorvon SD (2000) The incidence and lifetime prevalence of neurological disorders in a prospective community-based study in the UK. Brain: J Neurol 123(Pt 4):665–676. https://doi.org/10.1093/brain/123.4.665
Article Google Scholar
Olvera Lopez E, Ballard BD, Jan A. Cardiovascular Disease. [Updated 2022 Aug 8]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2023 Jan-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK535419/
NHS UK website (2023) Cardiovascular disease. Available at: https://www.nhs.uk/conditions/cardiovascular-disease
Wilson JD (2001) Prospects for research for disorders of the endocrine system. JAMA. 285(5):624–627. https://doi.org/10.1001/jama.285.5.624 Available from: https://jamanetwork.com/journals/jama/fullarticle/193529
Madhugiri D (2022) Apache Spark vs. hadoop mapreduce—top 7 differences, analytics Vidhya Blog, https://www.analyticsvidhya.com/blog/2022/06/apache-spark-vs-hadoop-mapreduce-top-7-differences

Download references

Author information

Authors and Affiliations

Assistive Technology Group, SCDM, London Metropolitan University, 166-220 Holloway Rd, London, N7 8DB, UK
Sandra Fernando, Victor Sowinski Mydlarz, Asya Katanani & Bal Virdee

Authors

Sandra Fernando
View author publications
You can also search for this author in PubMed Google Scholar
Victor Sowinski Mydlarz
View author publications
You can also search for this author in PubMed Google Scholar
Asya Katanani
View author publications
You can also search for this author in PubMed Google Scholar
Bal Virdee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandra Fernando .

Editor information

Editors and Affiliations

Department of Information Technology, Bhagwan Parshuram Institute of Technology, New Delhi, Delhi, India
Abhishek Swaroop
Jan Wyzykowski University, Polkowice, Poland
Zdzislaw Polkowski
Polytechnic Institute of Portalegre, Portalegre, Portugal
Sérgio Duarte Correia
Centre for Communications Technology, London Metropolitan University, London, UK
Bal Virdee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernando, S., Mydlarz, V.S., Katanani, A., Virdee, B. (2024). Cloud Spark Cluster to Analyse English Prescription Big Data for NHS Intelligence. In: Swaroop, A., Polkowski, Z., Correia, S.D., Virdee, B. (eds) Proceedings of Data Analytics and Management. ICDAM 2023. Lecture Notes in Networks and Systems, vol 785. Springer, Singapore. https://doi.org/10.1007/978-981-99-6544-1_27

Download citation

DOI: https://doi.org/10.1007/978-981-99-6544-1_27
Published: 14 January 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6543-4
Online ISBN: 978-981-99-6544-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics