Skip to main content

A MapReduce-Based Association Rule Mining Using Hadoop Cluster—An Application of Disease Analysis

  • Conference paper
  • First Online:
Innovations in Computer Science and Engineering

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 74))

Abstract

With the expansion and future potential in the field of healthcare industry, it is necessary to analyze a large amount of noisy data to obtain meaningful knowledge. Data mining techniques can be applied to remove inconsistent data and extract significant patterns. Association rule mining is a rule-based method which uncovers how items are associated with each other. Apriori algorithm is a broadly used algorithm for mining frequent itemsets for association rule mining. However, the performance of the apriori algorithm degrades with the large volume of data. So a parallel and distributed algorithm is required for efficient mining. In this paper, we provide an efficient implementation of the apriori algorithm in Hadoop MapReduce framework. We have considered medical data to produce rules which could be used to find the association between disease and their symptoms. These rules can be used for knowledge discovery to provide guidelines for the healthcare industry.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  2. Agrawal R, Imieliski T, Swami A (1993, June) Mining association rules between sets of items in large databases. In: ACM sigmod record, vol. 22, no. 2, pp 207–216. ACM

    Google Scholar 

  3. Chen G, Liu H, Yu L, Wei Q, Zhang X (2006) A new approach to classification based on association rule mining. Decis Support Syst 42(2):674–689

    Article  Google Scholar 

  4. Koper A, Nguyen HS (2011, December) Sequential pattern mining from stream data. In: International conference on advanced data mining and applications, pp 278–291. Springer, Berlin

    Google Scholar 

  5. Keim DA (2002) Information visualization and visual data mining. IEEE Trans Vis Comput Graph 1:1–8

    Article  Google Scholar 

  6. Yang HC, Dasdan A, Hsiao RL, Parker DS (2007, June). Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 1029–1040. ACM

    Google Scholar 

  7. Han J, Fu Y, Wang W, Koperski K, Zaiane O (1996, June) DMQL: a data mining query language for relational databases. In: Proceedings of 1996 SiGMOD, vol 96, pp 27–34

    Google Scholar 

  8. Agrawal R, Srikant R (1994, September) Fast algorithms for mining association rules. In: Proceedings of 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499

    Google Scholar 

  9. Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM sIGKDD Explor Newsl 14(2):1–5

    Article  Google Scholar 

  10. Shvachko, K, Kuang H, Radia S, Chansler R (2010, May) The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pp 1–10. IEEE

    Google Scholar 

  11. Han J, Fu Y (1999) Mining multiple-level association rules in large databases. IEEE Trans Knowl Data Eng 5:798–805

    Google Scholar 

  12. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Namrata Bhattacharya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhattacharya, N., Mondal, S., Khatua, S. (2019). A MapReduce-Based Association Rule Mining Using Hadoop Cluster—An Application of Disease Analysis. In: Saini, H., Sayal, R., Govardhan, A., Buyya, R. (eds) Innovations in Computer Science and Engineering. Lecture Notes in Networks and Systems, vol 74. Springer, Singapore. https://doi.org/10.1007/978-981-13-7082-3_61

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-7082-3_61

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-7081-6

  • Online ISBN: 978-981-13-7082-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics