Skip to main content

Cloud Spark Cluster to Analyse English Prescription Big Data for NHS Intelligence

  • Conference paper
  • First Online:
Proceedings of Data Analytics and Management (ICDAM 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 785))

Included in the following conference series:

  • 140 Accesses

Abstract

Spark is a large-scale data processing engine that is at least a hundred times faster than the Hadoop big data processing engine. Even though Spark is a complete in-memory framework, although limited with its big data platforms facilities compared to Hadoop, Spark analytics engine with Hadoop distributed file system gives better throughput than Hadoop alone. The main contribution of this paper is the insight into the behaviour of HDFS-based Azura Cloud Spark Cluster with discussion and evaluation of its strengths and limitations using NHS prescription large dataset. Data on NHS prescriptions obtained from 2015 to April 2022 exceeds 500 GB of records. A public dashboard for individual BNF code analysis and studies on NHS cost analysis exist, but no analysis of this data range and volume of NHS prescription and especially using new big data processing engines such as Spark was conducted. This study also contributes descriptive statistics and machine learning models of prescription data trends using Cloud Spark engine and PySpark technology that has not been used in this context before. This study illustrates regions as well as GP practices in terms of reimbursement cost, drug consumption level, the type of the drug, and the disease type; varied demand for dispensed chemical substances over the years; shows what diseases have increased or decreased over the years as well as the total cost and its trends.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Naser AY, Alwafi H, Al-Daghastani T, Hemmo SI, Alrawashdeh HM, Jalal Z, Paudyal V, Alyamani N, Almaghrabi M, Shamieh A (2022) Drugs utilization profile in England and Wales in the past 15 years: a secular trend analysis. BMC primary care 23(1):239. https://doi.org/10.1186/s12875-022-01853-1

    Article  Google Scholar 

  2. OpenPrescribing.net, Bennett Institute for Applied Data Science, University of Oxford, 2023, https://openprescribing.net/

  3. Salloum S, Dautov R, Chen X et al (2016) Big data analytics on Apache Spark. Int J Data Sci Anal 1:145–164. https://doi.org/10.1007/s41060-016-0027-9

    Article  Google Scholar 

  4. Shaikh E, Mohiuddin I, Alufaisan Y, Nahvi I (2019) Apache Spark: a big data processing engine. In: 2019 2nd IEEE middle East and North Africa communications conference (MENACOMM), Manama, Bahrain, pp 1–6. https://doi.org/10.1109/MENACOMM46666.2019.8988541

  5. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing (HotCloud'10). USENIX Association, USA, 10

    Google Scholar 

  6. Lekha RN, Sujala DS, Siddhanth DS (2018) Applying spark based machine learning model on streaming big data for health status prediction. Comput Electric Eng 65:393–399, ISSN 0045-7906

    Google Scholar 

  7. Bell J, GBE FF (2017) Life sciences industrial strategy—a report to the government from the life sciences sector. Office for Life Sciences

    Google Scholar 

  8. Kyoungyoung J, Gang-Hoon K (2013) Potentiality of big data in the medical sector: focus on how to reshape the healthcare system. The Korean Society of Medical Informatics, 79–85

    Google Scholar 

  9. Villars RL, Olofson CW, Eastwood M (2011) Big data: what it is and why you should care. IDC Analyze the Future, 4

    Google Scholar 

  10. Dash S, Shakyawar SK, Sharma M, Kaushik S (2019) Big data in healthcare: management, analysis and future prospects. J Big Data 54

    Google Scholar 

  11. Kretz A (2019) The data engineering cookbook: mastering the plumbing of data science v3

    Google Scholar 

  12. Wang G, Xin R, Damji J (2018) Benchmarking Apache Spark on a Single node machine, engineering Blog https://www.databricks.com/blog/2018/05/03/benchmarking-apache-spark-on-a-single-node-machine.html

  13. Microsoft (2023) Best practices: cluster configuration, Azure Databricks documentation, https://learn.microsoft.com/en-us/azure/databricks/clusters/cluster-config-best-practices

  14. Learning Journal (2021) Parallel processing in Apache Spark, Apache Spark core context, https://www.learningjournal.guru/article/apache-spark

  15. MacDonald BK, Cockerell OC, Sander JW, Shorvon SD (2000) The incidence and lifetime prevalence of neurological disorders in a prospective community-based study in the UK. Brain: J Neurol 123(Pt 4):665–676. https://doi.org/10.1093/brain/123.4.665

    Article  Google Scholar 

  16. Olvera Lopez E, Ballard BD, Jan A. Cardiovascular Disease. [Updated 2022 Aug 8]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2023 Jan-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK535419/

  17. NHS UK website (2023) Cardiovascular disease. Available at: https://www.nhs.uk/conditions/cardiovascular-disease

  18. Wilson JD (2001) Prospects for research for disorders of the endocrine system. JAMA. 285(5):624–627. https://doi.org/10.1001/jama.285.5.624 Available from: https://jamanetwork.com/journals/jama/fullarticle/193529

  19. Madhugiri D (2022) Apache Spark vs. hadoop mapreduce—top 7 differences, analytics Vidhya Blog, https://www.analyticsvidhya.com/blog/2022/06/apache-spark-vs-hadoop-mapreduce-top-7-differences

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandra Fernando .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fernando, S., Mydlarz, V.S., Katanani, A., Virdee, B. (2024). Cloud Spark Cluster to Analyse English Prescription Big Data for NHS Intelligence. In: Swaroop, A., Polkowski, Z., Correia, S.D., Virdee, B. (eds) Proceedings of Data Analytics and Management. ICDAM 2023. Lecture Notes in Networks and Systems, vol 785. Springer, Singapore. https://doi.org/10.1007/978-981-99-6544-1_27

Download citation

Publish with us

Policies and ethics