Abstract
Formal concept analysis (FCA) is an unsupervised machine learning technique used for knowledge discovery and representation. A major task in FCA is the enumeration of the implications to construct the implication base. Even though there are many efficient classical and parallel algorithms proposed for constructing the implication base, the existing algorithms are not well suited for large formal contexts because of their architectural complexity. All the existing works use either stem-base or proper-premise approach to find the implication base in exponential time. Hence, we introduce a distributed algorithm to find the implication base quickly in larger datasets in polynomial time. In this paper, we propose a scalable algorithm to find the implication base using machine learning technique FP-growth, big data processing framework Apache Spark and executed on large formal contexts. Extensive experiments on the real-world datasets show that the proposed algorithm has an improved gain in performance metrics such as execution time, CPU and memory usage. The statistical validations on the experimental results prove that the proposed algorithm has the better potential to find the implication base.
Similar content being viewed by others
References
Aranda-Corral GA (2020) A model of three-way decisions for Knowledge Harnessing. Int J Approx Reason 120:184–202
Aswani Kumar C, Raghavendra C (2021) Book Chapter on Big Data Processing frameworks and architectures. IET Handbook on big data analytics. Accepted 2021
Aswani Kumar Ch (2012) Fuzzy clustering-based formal concept analysis for association rules mining. Appl Artif Intell 26(3):274–301
Aswani Kumar C, Srinivas S (2010) Concept lattice reduction using fuzzy K-Means clustering. Expert Syst Appl 37(3):2696–2704
Aswani Kumar C, Singh P, Li J (2014) Concepts reduction in formal concept analysis with fuzzy setting using Shannon entropy. Int J Mach Learn Cybern 8(1):179–189
Aswani Kumar C, Singh P (2016) A comprehensive survey on formal concept analysis, its research trends and applications. Int J Appl Math Comput Sci 26(2):495–516
Aziz K, Zaidouni D, Bellafkih M (2019) Leveraging resource management for efficient performance of Apache Spark. J Big Data 6:78
Belohlavek R (2008) Introduction to formal concept analysis. Palacky University, Olomouc
Carcillo F, Pozzolo AD, Borgne YL, Caelen O, Mazzer Y, Bontempi G (2018) SCARFF: A scalable framework for streaming credit card fraud detection with spark. J Inf Fusion 41:182–194
Chen D, Li J, Lin R (2020) Formal concept analysis of multi-scale formal context. J Ambient Intell Humaniz Comput 11:5315–532
Christian B (2005) An implementation of the FP-growth algorithm. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Christian B (2012) Frequent item set mining. Wiley Interdiscip Rev Data Min Knowl Discov 2(6):437–456
Chunduri RK, Kumar CA, Tamir M (2017) Concept generation in formal concept analysis using MapReduce framework. In: International conference on big data analytics and computational intelligence (ICBDAC 2017)
Chunduri RK, Kumar CA (2018) HaLoop approach for concept generation in formal concept analysis. J Inf Knowl Manag 17(3):1–24
Chunduri RK, Kumar CA (2018) Scalable formal concept analysis algorithms for large datasets using Spark. J Ambient Intell Humaniz Comput 10(11):4283–4303
Damian TA (2020) Design principles for the General Data Protection Regulation (GDPR): a formal concept analysis and its evaluation. J Inf Syst 910:101469
Dmitry I (2017) Introduction to formal concept analysis and its applications in information retrieval and related fields. Russian Summer School in Information Retrieval RuSSIR, Information Retrieval, pp 42–141
Du Patrick BR, Bridge D (2006) Collaborative recommending using formal concept analysis. Knowl Based Syst 19(1):309–315
Fei H, Zheng P, Yang LT (2006) Diversified top- maximal clique detection in Social Internet of Things. J Future Gener Comput Syst 107:408–417
Ferrandin M, Nievola JC, Enembreck F, Scalabrin EE, Kredens KV, Avila B (2013) Hierarchical classification using FCA and cosine similarity function. In: Proceedings of the 2013 international conference on artificial intelligence at LasVegas
Francesco K, Daniel B (2017) NextClosures: parallel computation of the canonical base with background knowledge. Int J Gener Syst 46(5):490–510
Ganter B, Wille R (1996) Formal concept analysis: mathematical: foundations. Springer, Berlin
Ganter B (1984) Two basic algorithms for formal concept analysis (preprint). Technical Report 831, Technische Hochschule, Darmstadt
Guigues JL, Duquenne V (1986) Familles minimales d’implications informatives resultant d’un tableau de donnees binaires. Math. Sci. Hum. 95(1):5–18
Hammou BA, Lahcen AA, Mouline S (2019) An effective distributed predictive model with Matrix factorization and random forest for Big Data recommendation systems. Expert Syst Appl 137:253–265
Heaton J (2016) Comparing dataset characteristics that favor the Apriori, Eclat or FP-Growth frequent itemset mining algorithms. In: Supplementary Proceedings SoutheastCon , Norfolk, VA, pp 1–7
He R, Xiong NN, Yang LT, Park JH (2011) Using Multi-modal semantic association rules to fuse keywords and visual features automatically for Web image retrieval. J Inf Fusion 12:223–230
Liang Y, Zheng X, Zeng DD (2019) A survey on big data-driven digital phenotyping of mental health. J Inf Fusion 52:290–307
Konecny J (2020) Attribute implications in L-concept analysis with positive and negative attributes: validity and properties of models. Int J Approx Reason 120:203–215
Bazhanov Konstantin, Obiedkov Sergei (2014) Optimizations in computing the Duquenne–Guigues basis of implications. Ann Math Artif Intell 70(2):5–24
Kuznetsov S (1999) Learning simple conceptual graphs from positive and negative examples. Eur Conf Princ Data Min Knowl Discov 1704(1):384–391
Kuznetsov S (2016) Machine learning and formal concept analysis. In: Second international conference on formal concept analysis, ICFCA (2016), Sydney
Kuznetsov S, Poelmans J (2013) Knowledge representation and processing with formal concept analysis. WIREs Data Min Knowl Discov 3(3):200–215
Matei Z, Reynold XS, Patrick W, Das T, Michael Armbrust, Ankur D, Xiangrui M, Josh R, Shivaram V, Franklin Michael J, Ali G, Joseph G, Scott S, Ion S (2016) Apache Spark: a Unified Engine For big data processing. Commun ACM 59(11):56–65
Nilander RM, Sérgio MD, Henrique CF, Luis EZ (2016) Parallelization of the next Closure algorithm for generating the minimum set of implication rules. Artif Intell Res 5(2):40
Priss U (2005) Linguistic applications of formal concept analysis. Formal concept analysis. Springer, Berlin, pp 149–160
Priya M, Aswani Kumar Ch (2019) A novel method for merging academic social network ontologies using formal concept analysis and hybrid semantic similarity measure. Library Hi Tech 38(2):399–419
Priss U (2007) Formal concept analysis in information science. Ann Rev Inf Sci Technol 40(1):521–543
Ryssel B, Distel F (2014) Fast algorithms for implication bases and attribute exploration using proper premises. Ann Math Artif Intell 70(2):25–53
Sahana HP, Sanjana MS, Mohammed Muddasir N, Vidyashree KP (2020) Apache spark methods and techniques in big data—a review. In: Ranganathan G, Chen J, Rocha Á (eds) Inventive communication and computational technologies, vol 89. Lecture Notes in Networks and Systems. Springer, Singapore
Shivani J, Seeja KR, Rajni J (2020) A new methodology for computing semantic relatedness: modified latent semantic analysis by Fuzzy formal concept analysis. Procedia Comput Sci 167:1102–1109
Stumme G (2002) Efficient data mining based on formal concept analysis. Proc Int Conf Database Expert Syst Appl 46(5):490–510
Sumangali K, Kumar CA (2017) A comprehensive overview on the foundations of formal concept analysis. Knowl Manag E-Learn 9(4):512–538
Vieira N, Dias SM, Zárate LE, Song Mark AJ, Aswani Kumar C, (2020) Extraction of qualitative behavior rules for industrial processes from reduced concept lattice. IOS Press 24(3):643–663
Wang Z, Zhang J, Ji S, Meng C, Li T, Zheng Y (2020) Predicting and ranking box office revenue of movies based on big data. J Inf Fusion 60:25–40
Won KM, Young KK (2019) Soft concept lattice for formal concept analysis based on soft sets: theoretical foundations and applications. Soft Comput 23:9657–9668
Yao Y (2020) Three-way granular computing, rough sets, and formal concept analysis. Int J Approx Reason 116:106–125
Zdenek H, Vaclav S, Martin P, Hussam Dahwa M (2008) On concept lattices and implication bases from reduced contexts. In: Supplementary proceedings of the 16th international conference on conceptual structures, ICCS 2008, Toulouse, France, pp 83–90
Zhang S, Wu X (2011) Fundamentals of associations rules in data mining and knowledge discovery. Wiley Interdiscip Rev Data Min Knowl Discov 1(2):97–116
Zhi H, Qi J, Qian T, Wei L (2019) Three-way dual concept analysis. Int J Approx Reason 114:151–165
Zewdie M, Jenq-Haur W (2020) Vulnerable community identification using hate speech detection on social media. J Inf Process Manag 57(3):102087
Zhihao L, Yinan X, Hui L (2020) A distributed computing framework for wind speed big data forecasting on Apache Spark. Sustain Energy Technol Assess 37:100582
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chunduri, R.K., Cherukuri, A.K. Scalable algorithm for generation of attribute implication base using FP-growth and spark. Soft Comput 25, 9219–9240 (2021). https://doi.org/10.1007/s00500-021-05844-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05844-9