Skip to main content
Log in

Scalable algorithm for generation of attribute implication base using FP-growth and spark

  • Data analytics and machine learning
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Formal concept analysis (FCA) is an unsupervised machine learning technique used for knowledge discovery and representation. A major task in FCA is the enumeration of the implications to construct the implication base. Even though there are many efficient classical and parallel algorithms proposed for constructing the implication base, the existing algorithms are not well suited for large formal contexts because of their architectural complexity. All the existing works use either stem-base or proper-premise approach to find the implication base in exponential time. Hence, we introduce a distributed algorithm to find the implication base quickly in larger datasets in polynomial time. In this paper, we propose a scalable algorithm to find the implication base using machine learning technique FP-growth, big data processing framework Apache Spark and executed on large formal contexts. Extensive experiments on the real-world datasets show that the proposed algorithm has an improved gain in performance metrics such as execution time, CPU and memory usage. The statistical validations on the experimental results prove that the proposed algorithm has the better potential to find the implication base.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28

Similar content being viewed by others

References

  • Aranda-Corral GA (2020) A model of three-way decisions for Knowledge Harnessing. Int J Approx Reason 120:184–202

    Article  MathSciNet  MATH  Google Scholar 

  • Aswani Kumar C, Raghavendra C (2021) Book Chapter on Big Data Processing frameworks and architectures. IET Handbook on big data analytics. Accepted 2021

  • Aswani Kumar Ch (2012) Fuzzy clustering-based formal concept analysis for association rules mining. Appl Artif Intell 26(3):274–301

    Article  Google Scholar 

  • Aswani Kumar C, Srinivas S (2010) Concept lattice reduction using fuzzy K-Means clustering. Expert Syst Appl 37(3):2696–2704

    Article  Google Scholar 

  • Aswani Kumar C, Singh P, Li J (2014) Concepts reduction in formal concept analysis with fuzzy setting using Shannon entropy. Int J Mach Learn Cybern 8(1):179–189

    Google Scholar 

  • Aswani Kumar C, Singh P (2016) A comprehensive survey on formal concept analysis, its research trends and applications. Int J Appl Math Comput Sci 26(2):495–516

    Article  MathSciNet  MATH  Google Scholar 

  • Aziz K, Zaidouni D, Bellafkih M (2019) Leveraging resource management for efficient performance of Apache Spark. J Big Data 6:78

    Article  Google Scholar 

  • Belohlavek R (2008) Introduction to formal concept analysis. Palacky University, Olomouc

    Google Scholar 

  • Carcillo F, Pozzolo AD, Borgne YL, Caelen O, Mazzer Y, Bontempi G (2018) SCARFF: A scalable framework for streaming credit card fraud detection with spark. J Inf Fusion 41:182–194

    Article  Google Scholar 

  • Chen D, Li J, Lin R (2020) Formal concept analysis of multi-scale formal context. J Ambient Intell Humaniz Comput 11:5315–532

    Article  Google Scholar 

  • Christian B (2005) An implementation of the FP-growth algorithm. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations

  • Christian B (2012) Frequent item set mining. Wiley Interdiscip Rev Data Min Knowl Discov 2(6):437–456

    Article  Google Scholar 

  • Chunduri RK, Kumar CA, Tamir M (2017) Concept generation in formal concept analysis using MapReduce framework. In: International conference on big data analytics and computational intelligence (ICBDAC 2017)

  • Chunduri RK, Kumar CA (2018) HaLoop approach for concept generation in formal concept analysis. J Inf Knowl Manag 17(3):1–24

    Google Scholar 

  • Chunduri RK, Kumar CA (2018) Scalable formal concept analysis algorithms for large datasets using Spark. J Ambient Intell Humaniz Comput 10(11):4283–4303

    Article  Google Scholar 

  • Damian TA (2020) Design principles for the General Data Protection Regulation (GDPR): a formal concept analysis and its evaluation. J Inf Syst 910:101469

    Google Scholar 

  • Dmitry I (2017) Introduction to formal concept analysis and its applications in information retrieval and related fields. Russian Summer School in Information Retrieval RuSSIR, Information Retrieval, pp 42–141

  • Du Patrick BR, Bridge D (2006) Collaborative recommending using formal concept analysis. Knowl Based Syst 19(1):309–315

    Google Scholar 

  • Fei H, Zheng P, Yang LT (2006) Diversified top- maximal clique detection in Social Internet of Things. J Future Gener Comput Syst 107:408–417

    Google Scholar 

  • Ferrandin M, Nievola JC, Enembreck F, Scalabrin EE, Kredens KV, Avila B (2013) Hierarchical classification using FCA and cosine similarity function. In: Proceedings of the 2013 international conference on artificial intelligence at LasVegas

  • Francesco K, Daniel B (2017) NextClosures: parallel computation of the canonical base with background knowledge. Int J Gener Syst 46(5):490–510

    Article  MathSciNet  Google Scholar 

  • Ganter B, Wille R (1996) Formal concept analysis: mathematical: foundations. Springer, Berlin

    MATH  Google Scholar 

  • Ganter B (1984) Two basic algorithms for formal concept analysis (preprint). Technical Report 831, Technische Hochschule, Darmstadt

  • Guigues JL, Duquenne V (1986) Familles minimales d’implications informatives resultant d’un tableau de donnees binaires. Math. Sci. Hum. 95(1):5–18

    Google Scholar 

  • Hammou BA, Lahcen AA, Mouline S (2019) An effective distributed predictive model with Matrix factorization and random forest for Big Data recommendation systems. Expert Syst Appl 137:253–265

    Article  Google Scholar 

  • Heaton J (2016) Comparing dataset characteristics that favor the Apriori, Eclat or FP-Growth frequent itemset mining algorithms. In: Supplementary Proceedings SoutheastCon , Norfolk, VA, pp 1–7

  • He R, Xiong NN, Yang LT, Park JH (2011) Using Multi-modal semantic association rules to fuse keywords and visual features automatically for Web image retrieval. J Inf Fusion 12:223–230

    Article  Google Scholar 

  • Liang Y, Zheng X, Zeng DD (2019) A survey on big data-driven digital phenotyping of mental health. J Inf Fusion 52:290–307

    Article  Google Scholar 

  • Konecny J (2020) Attribute implications in L-concept analysis with positive and negative attributes: validity and properties of models. Int J Approx Reason 120:203–215

    Article  MathSciNet  MATH  Google Scholar 

  • Bazhanov Konstantin, Obiedkov Sergei (2014) Optimizations in computing the Duquenne–Guigues basis of implications. Ann Math Artif Intell 70(2):5–24

    Article  MathSciNet  MATH  Google Scholar 

  • Kuznetsov S (1999) Learning simple conceptual graphs from positive and negative examples. Eur Conf Princ Data Min Knowl Discov 1704(1):384–391

    Google Scholar 

  • Kuznetsov S (2016) Machine learning and formal concept analysis. In: Second international conference on formal concept analysis, ICFCA (2016), Sydney

  • Kuznetsov S, Poelmans J (2013) Knowledge representation and processing with formal concept analysis. WIREs Data Min Knowl Discov 3(3):200–215

    Article  Google Scholar 

  • Matei Z, Reynold XS, Patrick W, Das T, Michael Armbrust, Ankur D, Xiangrui M, Josh R, Shivaram V, Franklin Michael J, Ali G, Joseph G, Scott S, Ion S (2016) Apache Spark: a Unified Engine For big data processing. Commun ACM 59(11):56–65

    Article  Google Scholar 

  • Nilander RM, Sérgio MD, Henrique CF, Luis EZ (2016) Parallelization of the next Closure algorithm for generating the minimum set of implication rules. Artif Intell Res 5(2):40

    Google Scholar 

  • Priss U (2005) Linguistic applications of formal concept analysis. Formal concept analysis. Springer, Berlin, pp 149–160

    Book  MATH  Google Scholar 

  • Priya M, Aswani Kumar Ch (2019) A novel method for merging academic social network ontologies using formal concept analysis and hybrid semantic similarity measure. Library Hi Tech 38(2):399–419

    Article  Google Scholar 

  • Priss U (2007) Formal concept analysis in information science. Ann Rev Inf Sci Technol 40(1):521–543

    Article  Google Scholar 

  • Ryssel B, Distel F (2014) Fast algorithms for implication bases and attribute exploration using proper premises. Ann Math Artif Intell 70(2):25–53

    Article  MathSciNet  MATH  Google Scholar 

  • Sahana HP, Sanjana MS, Mohammed Muddasir N, Vidyashree KP (2020) Apache spark methods and techniques in big data—a review. In: Ranganathan G, Chen J, Rocha Á (eds) Inventive communication and computational technologies, vol 89. Lecture Notes in Networks and Systems. Springer, Singapore

    Chapter  Google Scholar 

  • Shivani J, Seeja KR, Rajni J (2020) A new methodology for computing semantic relatedness: modified latent semantic analysis by Fuzzy formal concept analysis. Procedia Comput Sci 167:1102–1109

    Article  Google Scholar 

  • Stumme G (2002) Efficient data mining based on formal concept analysis. Proc Int Conf Database Expert Syst Appl 46(5):490–510

    MATH  Google Scholar 

  • Sumangali K, Kumar CA (2017) A comprehensive overview on the foundations of formal concept analysis. Knowl Manag E-Learn 9(4):512–538

    Google Scholar 

  • Vieira N, Dias SM, Zárate LE, Song Mark AJ, Aswani Kumar C, (2020) Extraction of qualitative behavior rules for industrial processes from reduced concept lattice. IOS Press 24(3):643–663

  • Wang Z, Zhang J, Ji S, Meng C, Li T, Zheng Y (2020) Predicting and ranking box office revenue of movies based on big data. J Inf Fusion 60:25–40

    Article  Google Scholar 

  • Won KM, Young KK (2019) Soft concept lattice for formal concept analysis based on soft sets: theoretical foundations and applications. Soft Comput 23:9657–9668

    Article  MATH  Google Scholar 

  • Yao Y (2020) Three-way granular computing, rough sets, and formal concept analysis. Int J Approx Reason 116:106–125

    Article  MathSciNet  MATH  Google Scholar 

  • Zdenek H, Vaclav S, Martin P, Hussam Dahwa M (2008) On concept lattices and implication bases from reduced contexts. In: Supplementary proceedings of the 16th international conference on conceptual structures, ICCS 2008, Toulouse, France, pp 83–90

  • Zhang S, Wu X (2011) Fundamentals of associations rules in data mining and knowledge discovery. Wiley Interdiscip Rev Data Min Knowl Discov 1(2):97–116

    Article  Google Scholar 

  • Zhi H, Qi J, Qian T, Wei L (2019) Three-way dual concept analysis. Int J Approx Reason 114:151–165

    Article  MathSciNet  MATH  Google Scholar 

  • Zewdie M, Jenq-Haur W (2020) Vulnerable community identification using hate speech detection on social media. J Inf Process Manag 57(3):102087

    Article  Google Scholar 

  • Zhihao L, Yinan X, Hui L (2020) A distributed computing framework for wind speed big data forecasting on Apache Spark. Sustain Energy Technol Assess 37:100582

    Google Scholar 

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aswani Kumar Cherukuri.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chunduri, R.K., Cherukuri, A.K. Scalable algorithm for generation of attribute implication base using FP-growth and spark. Soft Comput 25, 9219–9240 (2021). https://doi.org/10.1007/s00500-021-05844-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05844-9

Keywords

Navigation