Association Analysis: Basic Concepts and Algorithms

Chen, Qingfeng

doi:10.1007/978-981-99-8251-6_2

Qingfeng Chen²

Abstract

In 1993, Agrawal et al. pioneered the theory of mining association rules from large database, which is used to identify interesting links between items in market basket data transactions. Market basket transaction is a typical example of the application of association analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Random walk describes a path consisting of a series of random steps in a certain mathematical space. See Sect. 2.7 for details.

References

AGRAWAL R, IMIELIŃSKI T, Swami A. Mining association rules between sets of items in large databases[C] //Proceedings of the 1993 ACM SIGMOD international conference on Management of data. 1993: 207–216.
Google Scholar
AGRAWAL R, SRIKANT R, et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, 1994, 1215:487–499.
Google Scholar
HAN J W, KAMBER M, Pei J. Data Mining: Concepts and Techniques (3rd Edition) [M]. Translated by Fan Ming, Meng Xiaofeng. Beijing: Machinery Industry Press, 2012.
Google Scholar
TAN P N, STEINBACH M, KUMAR V. Introduction to Data Mining (Full Version) [M]. Fan Ming, translated by Fan Hongjian. Beijing: People’s Posts and Telecommunications Press, 2021.
Google Scholar
LIU P, ZHANG Y. Data Mining[M]. Beijing: Electronic Industry Press, 2018.
Google Scholar
SAVASERE A, OMIECINSKI E R, NAVATHE S B. An efficient algorithm for mining association rules in large databases[R]. Georgia Institute of Technology, 1995.
Google Scholar
PARK J S, CHEN M S, YU P S. An effective hash-based algorithm for mining association rules[J]. Acm sigmod record, 1995, 24(2): 175–186.
MathSciNet Google Scholar
MANNILA H, TOIVONEN H, VERKAMO A I. Efficient algorithms for discovering association rules[C]//KDD-94: AAAI workshop on Knowledge Discovery in Databases. 1994: 181–192.
Google Scholar
HAN J, PEI J, YIN Y. Mining frequent patterns without candidate generation[J]. ACM sigmod record, 2000, 29(2): 1–12.
Google Scholar
PAWLAK Z. Rough sets[J]. International journal of computer & information sciences, 1982, 11(5): 341–356.
MathSciNet Google Scholar
PAWLAK Z, GRZYMALA-BUSSE J, SLOWINSKI R, et al. Rough sets[J]. Communications of the ACM, 1995, 38(11): 88–95.
Google Scholar
VIGER P F, LIN C W, KIRAN R U, et al. A survey of sequential pattern mining. Data Science and Pattern Recognition, 2017, 1(1): 54–77.
Google Scholar
AGRAWAL R, SRIKANT R. Mining sequential patterns[C] // Proceedings of the eleventh international conference on data engineering. IEEE, 1995: 3–14.
Google Scholar
WANG H, DING S. Research and Development of Sequential Pattern Mining [J]. Computer Science, 2009, 36(12): 14–17.
Google Scholar
SRIKANT R, AGRAWAL R. Mining sequential patterns: Generalizations and performance improvements[C] //International conference on extending database technology. Springer, Berlin, Heidelberg, 1996: 1–17.
Google Scholar
ZHANG M, KAO B, YIP C, et a1. A GSP-based efficient algorithm for mining frequent sequences[C] //Proc. of International Conference on Artificial Intelligence. Nevada, 2001.
Google Scholar
MASSEGLIA F, CATHALA F, PONCELET P. The PSP approach for mining sequential patterns[C] //Proc. of the 2nd European. Symposium on Principles of Data Mining and Knowledge Discovery. Berlin: Springer-Verlag, 1998, 1510: 176–184.
Google Scholar
ZAKI M J. SPADE: An efficient algorithm for mining frequent sequences[J]. Machine Learning, 2001, 41(1): 31–60.
Google Scholar
HAN J, PEI J, MORTAZVI-ASL B, et a1. FreeSpan: frequent pattern projected sequential pattern mining[C] // Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM Press, 2000: 355–359.
Google Scholar
HAN J, PEI J, MORTAZAVI-ASL B, et al. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth[C] //Proceedings of the 17th international conference on data engineering. Washington DC: IEEE Computer Society, 2001: 215–224.
Google Scholar
LIN M Y, LEE S Y. Fast discovery of sequential patterns by memory indexing[C] // Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery. London UK: Springer-Verlag, 2002: 150–160.
Google Scholar
SUI Y, SHAO F, SUN R, et al. A Sequential Pattern Mining Algorithm Based on Improved FP-tree, 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2008: 440–444.
Google Scholar
INOKUCHI A, WASHIO T, MOTODA H. An apriori-based algorithm for mining frequent substructures from graph data[C] //European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, 2000: 13–23.
Google Scholar
YAN X, HAN J. gSpan: Graph-based substructure pattern mining[C] //2002 IEEE International Conference on Data Mining, 2002. Proceedings. IEEE, 2002: 721–724.
Google Scholar
HUAN J, WANG W, PRINS J. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. In 2003 IEEE International Conference on Data Mining, 2003: 549–552.
Google Scholar
CHEN Q F, LAN C W, CHEN B S, et al. Exploring consensus RNA substructural patterns using subgraph mining. IEEE/ACM transactions on computational biology and bioinformatics. 2016, 14(5): 1134–1146.
Google Scholar
VANETIK N, GUDES E, SHIMONY S E. Computing Frequent Graph Patterns from Semi-structured Data. In 2002 IEEE International Conference on Data Mining, 2002: 458–465.
Google Scholar
HU H, YAN X, HUANG Y, et a1. Mining Coherent Dense Subgraphs across Massive Biological Networks for Functional Discovery. Bioinformatics, 2005, 21(1): 213–221.
Google Scholar
FATTA G D, BERTHOLD M R. High Performance Subgraph Mining in Molecular Compounds[C] //International Conference on High Performance Computing and Communications, Sorrento, Italy, 2005: 866–877.
Google Scholar
Zhang Wei. Research on Frequent Subgraph Mining Algorithm[D]. Yanshan University, 2011.
Google Scholar
WASHIO T, MOTODA H. State of the art of graph based data mining[J]. Acm Sigkdd Explorations Newsletter, 2003, 5(1): 59–68.
Google Scholar
COOK D J, HOLDER L B. Substructure discovery using minimum description length and background knowledge[J]. Journal of Artificial Intelligence Research, 1993: 231–255.
Google Scholar
INOKUCHI A, WASHIN T, NISHIMURA K, et al. A Fast Algorithm for Mining Frequent Connected Subgraphs. IBM Research Report. 2002.
Google Scholar
KURAMOCHI M, KARYPIS G. Frequent Subgraph Discovery. In Proceedings 2001 IEEE international conference on data mining, 2001: 313–320.
Google Scholar
YAN X, HAN J. Closegraph: mining closed frequent graph patterns[C] //Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 2003: 286–295.
Google Scholar
HUAN J, WANG W, PRINS J. Efficient mining of frequent subgraphs in the presence of isomorphism. Third IEEE international conference on data mining. 2003: 449–552.
Google Scholar
SAVASERE A, OMIECINSKI E, NAVATHE S. Mining for strong negative associations in a large database of customer transactions[C] //Proceedings 14th International Conference on Data Engineering. IEEE, 1998: 494–502.
Google Scholar
WU X, ZHANG C, ZHANG S. Mining both positive and negative association rules[C] //International Conference on Machine Learning. 2002, 2: 658–665.
Google Scholar
ANTONIE M L, ZAÏANE O R. Mining positive and negative association rules: An approach for confined rules[C] //European Conference on Principles of Data Mining and Knowledge Discovery. 2004: 27–38.
Google Scholar
ZHANG S C, CHEN F, WU X D. Identifying bridging rules between conceptual clusters[C] //Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 2006: 815–820.
Google Scholar
MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013.
Google Scholar
Ng P. dna2vec: Consistent vector representations of variable-length k-mers. arXiv preprint arXiv:1701.06279, 2017.
Google Scholar
WANG Y, HOU Y, CHE W, et al. From static to dynamic word representations: a survey[J]. International Journal of Machine Learning and Cybernetics, 2020, 11(7): 1611–1630.
Google Scholar
McClelland J L, Rumelhart D E, PDP Research Group. Parallel distributed processing[M]. Cambridge, MA: MIT press, 1986.
Google Scholar
HUANG F, YATES A. Distributional representations for handling sparsity in supervised sequence-labeling[C] //Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 495–503.
Google Scholar
DAGAN I, PEREIRA F, LEE L. Similarity-based estimation of word cooccurrence probabilities[J]. arXiv preprint cmp-lg/9405001, 1994.
Google Scholar
DEERWESTER S, DUMAIS S T, FURNAS G W, et al. Indexing by latent semantic analysis[J]. Journal of the American society for information science. 1990, 41(6): 391–407.
Google Scholar
BLEI D M, NG A, JORDAN M I. Latent dirichlet allocation[J]. The Journal of Machine Learning Research. 2003, 3: 993–1022.
Google Scholar
BENGIO Y, DUCHARME R, VINCENT P. A neural probabilistic language model[J]. Advances in Neural Information Processing Systems, 2003: 1137–1155.
Google Scholar
PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014: 1532–1543.
Google Scholar
MCCANN B, BRADBURY J, XIONG C, et al. Learned in translation: contextualized word vectors. Advances in neural information processing systems. 2017, 30:6294–6305.
Google Scholar
PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations. Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies. 2018, 1: 2227–2237.
Google Scholar
Heinzinger M, Elnaggar A, Wang Y, et al. Modeling aspects of the language of life through transfer-learning protein sequences. BMC bioinformatics. 2019: 20(1):1–7.
Google Scholar
DEVLIN J, CHANG M W, LEE K, et al. Bert: Pretraining of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT. 2019: 4171–4186.
Google Scholar
PEARSON K.The problem of the random walk[J], Nature, 1905, 72(1865): 294–294.
Google Scholar
PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: Bringing order to the web[R]. Stanford InfoLab, 1999.
Google Scholar
PAN J Y, YANG H J, FALOUTSOS C, et al. Automatic multimedia cross-modal correlation discovery[C] //Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 2004: 653–658.
Google Scholar
SHEN J, DU Y, WANG W, et al. Lazy random walks for superpixel segmentation[J]. IEEE Transactions on Image Processing, 2014, 23(4): 1451–1462.
Google Scholar
LOVÁSZ L. Random walks on graphs: A survey, Combinatorics, Paul Erdos Eighty[J]. lecture notes in mathematics, 1993, 2(1): 1–46.
Google Scholar
BRAND M. A random walks perspective on maximizing satisfaction and profit[C] //Proceedings of the 2005 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2005: 12–19.
Google Scholar
GORI M, PUCCI A, ROMA V, et al. Itemrank: A random-walk based scoring algorithm for recommender engines[C]//International Joint Conference on Artificial Intelligence. 2007, 7: 2766–2771.
Google Scholar
XIA F, LIU H F, LEE I, et al. Scientific article recommendation: Exploiting common author relations and historical preferences[J]. IEEE Transactions on Big Data, 2016, 2(2): 101–112.
Google Scholar
LIU W P, LÜ L Y. Link prediction based on local random walk[J]. EPL (europhysics Letters), 2010, 89(5): 58007.
Google Scholar
BACKSTROM L, LESKOVEC J. Supervised random walks: predicting and recommending links in social networks[C] //Proceedings of the fourth ACM international conference on Web search and data mining. 2011: 635–644.
Google Scholar
SINGHAL A. Introducing the knowledge graph: things, not strings[Z]. Official Google Blog. 2012.
Google Scholar
McCray AT. An upper-level ontology for the biomedical domain. Comparative and Functional genomics. 2003, 4(1): 80–4.
Google Scholar
Li Danya, Hu Tiejun, Li Junlian, Qian Qing, Zhu Wenyan. Construction and Application of Chinese Integrated Medical Language System[J]. Journal of Information, 2011,30(02):147–151.
Google Scholar
Aodema, Yang Yunfei, Sui Zhifang, etc. A Preliminary Study on the Construction of Chinese Medical Knowledge Graph CMeKG [J]. Chinese Journal of Information, 2019, 33(10): 1–9.
Google Scholar
Guan S, Jin X, Jia Y, et al. Research Progress on Knowledge Reasoning Based on Knowledge Graph[J]. Journal of Software, 2018, 29(10): 2966–2994.
MathSciNet Google Scholar
CHEN X J, JIA S B, XIANG Y. A review: Knowledge reasoning over knowledge graph[J]. Expert Systems with Applications, 2020, 141: 112948.
Google Scholar
SCHOENMACKERS S, DAVIS J, ETZIONI O, et al. Learning first-order horn clauses from web text[C] //Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010: 1088–1098.
Google Scholar
NAKASHOLE N, SOZIO M, SUCHANEK F M, et al. Query-time reasoning in uncertain RDF knowledge bases with soft and hard rules[J]. International Conference on Very Large Data Bases, 2012, 884: 15–20.
Google Scholar
GALÁRRAGA L A, TEFLIOUDI C, HOSE K, et al. AMIE: association rule mining under incomplete evidence in ontological knowledge bases[C] //Proceedings of the 22nd international conference on World Wide Web. 2013: 413–422.
Google Scholar
MITCHELL T, COHEN W, HRUSCHKA E, et al. Never-ending learning[J]. Communications of the ACM, 2018, 61(5): 103–115.
Google Scholar
PAULHEIM H, BIZER C. Improving the quality of linked data using statistical distributions[J]. International Journal on Semantic Web and Information Systems (IJSWIS), 2014, 10(2): 63–86.
Google Scholar
JANG S, MEGAWATI M, CHOI J, et al. Semi-automatic quality assessment of linked data without requiring ontology[C] //Proceedings of the Third NLP&DBpedia Workshop (NLP & DBpedia 2015) co-located with the 14th International Semantic Web Conference 2015 (ISWC 2015). 2015: 45–55.
Google Scholar
WANG W Y, MAZAITIS K, COHEN W W. Programming with personalized pagerank: a locally groundable first-order probabilistic logic[C] //Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2013: 2129–2138.
Google Scholar
CATHERINE R, COHEN W. Personalized recommendations using knowledge graphs: A probabilistic logic programming approach[C] //Proceedings of the 10th ACM conference on recommender systems. 2016: 325–332.
Google Scholar
JIANG S P, LOWD D, DOU D J. Learning to refine an automatically extracted knowledge base using markov logic[C] //2012 IEEE 12th International Conference on Data Mining. 2012: 912–917.
Google Scholar
CHEN Y, WANG D Z. Knowledge expansion over probabilistic knowledge bases[C] //Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 2014: 649–660.
Google Scholar
KUŽELKA O, DAVIS J. Markov logic networks for knowledge base completion: A theoretical analysis under the MCAR assumption[C] //Uncertainty in Artificial Intelligence. PMLR, 2020: 1138–1148.
Google Scholar
KIMMIG A, BACH S, BROECHELER M, et al. A short introduction to probabilistic soft logic[C]//Proceedings of the NIPS workshop on probabilistic programming: foundations and applications. 2012: 1–4.
Google Scholar
NICKEL M, TRESP V, KRIEGEL H P. A three-way model for collective learning on multi-relational data[C] //International Conference on Machine Learning. 2011.
Google Scholar
BORDES A, USUNIER N, GARCIA-DURAN A, et al. Translating embeddings for modeling multi-relational data[J]. Advances in neural information processing systems, 2013, 26.
Google Scholar
BORDES A, GLOROT X, WESTON J, et al. Joint learning of words and meaning representations for open-text semantic parsing[C] //Artificial intelligence and statistics. PMLR, 2012: 127–135.
Google Scholar
SOCHER R, CHEN D, MANNING C D, et al. Reasoning with neural tensor networks for knowledge base completion[J]. Advances in neural information processing systems, 2013, 26.
Google Scholar
CHEN D, SOCHER R, MANNING C D, et al. Learning new facts from knowledge bases with neural tensor networks and semantic word vectors[J]. arXiv preprint arXiv:1301.3618, 2013.
Google Scholar
SHI B, WENINGER T. Proje: Embedding projection for knowledge graph completion[C] //Proceedings of the AAAI Conference on Artificial Intelligence. 2017, 31(1).
Google Scholar
LIU Q, JIANG H, EVDOKIMOV A, et al. Probabilistic reasoning via deep learning: Neural association models[J]. arXiv preprint arXiv:1603.07704, 2016.
Google Scholar
WANG X, SONG X. Design of Network Security Vulnerability Type Correlation Analysis System Based on Knowledge Graph [J]. Electronic Design Engineering, 2021, 29(17):85–89.
MathSciNet Google Scholar
GUO J. Research on Association Analysis Method of Aviation Safety Events Based on Knowledge Graph[D]. Civil Aviation University of China, 2020.
Google Scholar
LI Y. Construction and application of knowledge graph for natural disaster emergency response [D]. Wuhan University, 2021.
Google Scholar
LIU B. Research on Association Analysis Technology of Cyberspace Resources Based on Knowledge Graph [D]. Huazhong University of Science and Technology, 2019.
Google Scholar
WANG W. Research on Association Analysis Technology of Distributed Security Events Based on Knowledge Graph [D]. National University of Defense Technology, 2018.
Google Scholar
CHEN X. Research on Information Association Analysis Method Based on Knowledge Graph [D]. Harbin Engineering University, 2018.
Google Scholar
Wu Jiamin. Construction and analysis of lung cancer medical knowledge map [D]. Ningxia University, 2019.
Google Scholar
Nordon G, Koren G, Shalev V, et al. Separating wheat from chaff: Joining biomedical knowledge and patient data for repurposing medications[C] //Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 9565–9572.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Electronic Information, Guangxi University, Nanning, Guangxi, China
Qingfeng Chen

Authors

Qingfeng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, Q. (2024). Association Analysis: Basic Concepts and Algorithms. In: Association Analysis Techniques and Applications in Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-99-8251-6_2

Download citation

DOI: https://doi.org/10.1007/978-981-99-8251-6_2
Published: 26 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8250-9
Online ISBN: 978-981-99-8251-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics