Abstract
Knowledge computation tasks, such as computing a base of valid implications, are often infeasible for large data sets. This is in particular true when deriving canonical bases in formal concept analysis (FCA). Therefore, it is necessary to find techniques that on the one hand reduce the data set size, but on the other hand preserve enough structure to extract useful knowledge. Many successful methods are based on random processes to reduce the size of the investigated data set. This, however, makes them hardly interpretable with respect to the discovered knowledge. Other approaches restrict themselves to highly supported subsets and omit rare and (maybe) interesting patterns. An essentially different approach is used in network science, called k-cores. These cores are able to reflect rare patterns, as long as they are well connected within the data set. In this work, we study k-cores in the realm of FCA by exploiting the natural correspondence of bi-partite graphs and formal contexts. This structurally motivated approach leads to a comprehensible extraction of knowledge cores from large formal contexts.
References
Ahmed, A., Batagelj, V., Fu, X., Hong, S.H., Merrick, D., Mrvar, A.: Visualisation and analysis of the internet movie database. In: S.H. Hong, K.L. Ma (eds.) APVIS, pp. 17–24. IEEE Computer Society. http://dblp.uni-trier.de/db/conf/apvis/apvis2007.html#AhmedBFHMM07 (2007)
Andrews, S., Orphanides, C.: Analysis of large data sets using formal concept lattices. In: M. Kryszkiewicz, S.A. Obiedkov (eds.) CLA, vol. 672, pp. 104–115. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/cla/cla2010.html#AndrewsO10 (2010)
Aswanikumar, C., Srinivas, S.: Concept lattice reduction using fuzzy k-means clustering. Expert Syst. Appl. 37 (3), 2696–2704 (2010). http://dblp.uni-trier.de/db/journals/eswa/eswa37.html#AswanikumarS10
Borchmann, D., Hanika, T.: Some experimental results on randomly generating formal contexts. In: M. Huchard, S. Kuznetsov (eds.) CLA, CEUR Workshop Proceedings, vol. 1624, pp. 57–69. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/cla/cla2016.html#BorchmannH16 (2016)
Codocedo, V., Taramasco, C., Astudillo, H.: Cheating to achieve formal concept analysis over a large formal context. In: A. Napoli, V. Vychodil (eds.) CLA, vol. 959, pp. 349–362. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/cla/cla2011.html#CodocedoTA11 (2011)
Degens, P., Hermes, H., Opitz, O. (eds.): Implikationen Und Abhängigkeiten Zwischen Merkmalen. Studien Zur Klassifikation. Indeks, Frankfurt (1986)
Distel, F., Sertkaya, B.: On the complexity of enumerating pseudo-intents. Discrete Applied Mathematics 159(6), 450–466 (2011). http://dblp.uni-trier.de/db/journals/dam/dam159.html#DistelS11
Doerfel, S., Jäschke, R.: An analysis of tag-recommender evaluation procedures. In: In: Q. Yang, I. King, Q. Li, P. Pu, G. Karypis (eds.) RecSys ’13, pp. 343–346. ACM. https://doi.org/10.1145/2507157.2507222 (2013)
Dua, D., Graff, C.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2017)
Fischer, J., Vreeken, J.: Sets of robust rules, and how to find them. In: ECML/PKDD. https://ecmlpkdd2019.org/downloads/paper/650.pdf (2019)
Ganter, B.: Two basic algorithms in concept analysis. In: L. Kwuida, B. Sertkaya (eds.) Formal Concept Analysis, LNCS, vol. 5986, pp. 312–340. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-11928-6∖_22 (2010)
Ganter, B., Wille, R.: Implikationen Und Abhangigkeiten̈ Zwischen Merkmalen. In: Degens, P. O., Hermes, H. J. Opitz, O.(eds.) Die Klassifikation Und Ihr Umfeld, pp. 171-185. Indeks, Frankfurt (1986)
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer-Verlag, Berlin (1999)
Ghani, A. C., Swinton, J., Garnett, G.P.: The role of sexual partnership networks in the epidemiology of gonorrhea. Sexually transmitted diseases 24(1), 45–56 (1997)
Guigues, J.L., Duquenne, V.: Familles minimales d’implications informatives résultant d’un tableau de données binaires. Mathématiques et Sciences Humaines 95, 5–18 (1986). http://eudml.org/doc/94331
Hanika, T., Hirth, J.: Conexp-clj - a research tool for FCA. In: D. Cristea, F.L. Ber, R. Missaoui, L. Kwuida, B. Sertkaya (eds.) ICFCA (Supplements), vol. 2378, pp. 70–75. CEUR-WS.org. http://dblp.uni-trier.de/db/conf/icfca/icfca2019suppl.html#HanikaH19 (2019)
Hanika, T., Koyda, M., Stumme, G.: Relevant attributes in formal contexts. In: D. Endres, M. Alam, D. Sotropa (eds.) ICCS, LNCS, vol. 11530, pp. 102–116. Springer. https://doi.org/10.1007/978-3-030-23182-8_8 (2019)
Hanika, T., Marx, M., Stumme, G.: Discovering implicational knowledge in wikidata. In: D. Cristea, F.L. Ber, B. Sertkaya (eds.) Formal Concept Analysis - 15th International Conference, ICFCA 2019, Proceedings, LNCS, vol. 11511, pp. 315–323. Springer. https://doi.org/10.1007/978-3-030-21462-3_21 (2019)
Healy, J., Janssen, J.C.M., Milios, E.E., Aiello, W.: Characterization of graphs using degree cores. In: W. Aiello, A.Z. Broder, J.C.M. Janssen, E.E. Milios (eds.) WAW, LNCS, vol. 4936, pp. 137–148. Springer. http://dblp.uni-trier.de/db/conf/waw/waw2006.html#HealyJMA06 (2006)
Kitsak, M., Gallos, L.K., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H.E., Makse, H.A.: Identification of influential spreaders in complex networks. Nature Physics 6(11), 888–893 (2010). https://doi.org/10.1038/nphys1746
Kuznetsov, S.: On the intractability of computing the Duquenne-Guigues base. Journal of Universal Computer Science 10(8), 927–933 (2004)
Kuznetsov, S.O., Obiedkov, S.A., Roth, C.: Reducing the representation complexity of lattice-based taxonomies. In: U. Priss, S. Polovina, R. Hill (eds.) Conceptual Structures: Knowledge Architectures for Smart Applications, 15th International Conference on Conceptual Structures, ICCS 2007, Sheffield, UK, July 22-27, 2007, Proceedings, Lecture Notes in Computer Science, vol. 4604, pp. 241–254. Springer. https://doi.org/10.1007/978-3-540-73681-3_18 (2007)
Mahn, M.: Gewürze : Das Standardwerk. Christian Verlag GmbH, München (2014)
Matula, D.W., Beck, L.L.: Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30(3), 417–427 (1983). http://dblp.uni-trier.de/db/journals/jacm/jacm30.html#MatulaB83
Pastor-Satorras, R., Castellano, C., Van Mieghem, P., Vespignani, A.: Epidemic processes in complex networks. Reviews of Modern Physics 87 (3), 925–979 (2015). https://doi.org/10.1103/RevModPhys.87.925
Roth, C., Obiedkov, S.A., Kourie, D.G.: On succinct representation of knowledge community taxonomies with formal concept analysis. Int. J. Found. Comput. Sci. 19(2), 383–404 (2008). http://dblp.uni-trier.de/db/journals/ijfcs/ijfcs19.html#RothOK08
Seidman, S.B.: Network structure and minimum degree. Soc. Networks 5(3), 269–287 (1983)
Soldano, H., Santini, G., Bouthinon, D., Bary, S., Lazega, E.: Bi-pattern mining of two mode and directed networks. In: P. Champin, F.L. Gandon, M. Lalmas, P.G. Ipeirotis (eds.) WWW Companion, pp. 1287–1294. ACM. https://doi.org/10.1145/3184558.3191568 (2018)
Stumme, G.: Efficient Data Mining Based on Formal Concept Analysis DEXA, LNCS, vol. 2453, pp. 534–546. Springer (2002)
Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with titanic. Data & Knowledge Engineering 42(2), 189–222 (2002). https://doi.org/10.1016/S0169-023X(02)00057-5. http://portal.acm.org/citation.cfm?id=606457
Tatti, N., Moerchen, F., Calders, T.: Finding robust itemsets under subsampling. ACM Trans. Database Syst. 39(3), 20:1–20:27 (2014). https://doi.org/10.1145/2656261
Valtchev, P., Duquenne, V.: On the merge of factor canonical bases. In: R. Medina, S.A. Obiedkov (eds.) ICFCA, LNCS, vol. 4933, pp. 182–198. Springer. https://doi.org/10.1007/978-3-540-78137-0_14 (2008)
Wille, R.: Ordered Sets: Proc. of the NATO Adv. Study Institute Held at Banff, Canada, August 28 to September 12, 1981, Chap. Restructuring Lattice Theory1 An Approach Based on Hierarchies of Concepts, pp. 445–470. Springer, Dordrecht (1982)
Zaki, M.J., Hsiao, C.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering 17(4), 462–478 (2005). https://doi.org/10.1109/TKDE.2005.60
Acknowledgements
We thank Robert Jäschke for pinpointing us to the spice data set.
Funding
Open Access funding enabled and organized by Projekt DEAL. This work was funded by the German Federal Ministry of Education and Research (BMBF) in its program “CIDA - Computational Intelligence & Data Analytics” under grant number 01IS17057.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interests/Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hanika, T., Hirth, J. Knowledge cores in large formal contexts. Ann Math Artif Intell 90, 537–567 (2022). https://doi.org/10.1007/s10472-022-09790-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-022-09790-6
Keywords
- k-cores
- Bi-Partite graphs
- Formal concept analysis
- Lattices
- Implications
- Knowledge base