Abstract
With the expansion and future potential in the field of healthcare industry, it is necessary to analyze a large amount of noisy data to obtain meaningful knowledge. Data mining techniques can be applied to remove inconsistent data and extract significant patterns. Association rule mining is a rule-based method which uncovers how items are associated with each other. Apriori algorithm is a broadly used algorithm for mining frequent itemsets for association rule mining. However, the performance of the apriori algorithm degrades with the large volume of data. So a parallel and distributed algorithm is required for efficient mining. In this paper, we provide an efficient implementation of the apriori algorithm in Hadoop MapReduce framework. We have considered medical data to produce rules which could be used to find the association between disease and their symptoms. These rules can be used for knowledge discovery to provide guidelines for the healthcare industry.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
Agrawal R, Imieliski T, Swami A (1993, June) Mining association rules between sets of items in large databases. In: ACM sigmod record, vol. 22, no. 2, pp 207–216. ACM
Chen G, Liu H, Yu L, Wei Q, Zhang X (2006) A new approach to classification based on association rule mining. Decis Support Syst 42(2):674–689
Koper A, Nguyen HS (2011, December) Sequential pattern mining from stream data. In: International conference on advanced data mining and applications, pp 278–291. Springer, Berlin
Keim DA (2002) Information visualization and visual data mining. IEEE Trans Vis Comput Graph 1:1–8
Yang HC, Dasdan A, Hsiao RL, Parker DS (2007, June). Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 1029–1040. ACM
Han J, Fu Y, Wang W, Koperski K, Zaiane O (1996, June) DMQL: a data mining query language for relational databases. In: Proceedings of 1996 SiGMOD, vol 96, pp 27–34
Agrawal R, Srikant R (1994, September) Fast algorithms for mining association rules. In: Proceedings of 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499
Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM sIGKDD Explor Newsl 14(2):1–5
Shvachko, K, Kuang H, Radia S, Chansler R (2010, May) The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pp 1–10. IEEE
Han J, Fu Y (1999) Mining multiple-level association rules in large databases. IEEE Trans Knowl Data Eng 5:798–805
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bhattacharya, N., Mondal, S., Khatua, S. (2019). A MapReduce-Based Association Rule Mining Using Hadoop Cluster—An Application of Disease Analysis. In: Saini, H., Sayal, R., Govardhan, A., Buyya, R. (eds) Innovations in Computer Science and Engineering. Lecture Notes in Networks and Systems, vol 74. Springer, Singapore. https://doi.org/10.1007/978-981-13-7082-3_61
Download citation
DOI: https://doi.org/10.1007/978-981-13-7082-3_61
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7081-6
Online ISBN: 978-981-13-7082-3
eBook Packages: EngineeringEngineering (R0)