Abstract
In 1993, Agrawal et al. pioneered the theory of mining association rules from large database, which is used to identify interesting links between items in market basket data transactions. Market basket transaction is a typical example of the application of association analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Random walk describes a path consisting of a series of random steps in a certain mathematical space. See Sect. 2.7 for details.
References
AGRAWAL R, IMIELIŃSKI T, Swami A. Mining association rules between sets of items in large databases[C] //Proceedings of the 1993 ACM SIGMOD international conference on Management of data. 1993: 207–216.
AGRAWAL R, SRIKANT R, et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, 1994, 1215:487–499.
HAN J W, KAMBER M, Pei J. Data Mining: Concepts and Techniques (3rd Edition) [M]. Translated by Fan Ming, Meng Xiaofeng. Beijing: Machinery Industry Press, 2012.
TAN P N, STEINBACH M, KUMAR V. Introduction to Data Mining (Full Version) [M]. Fan Ming, translated by Fan Hongjian. Beijing: People’s Posts and Telecommunications Press, 2021.
LIU P, ZHANG Y. Data Mining[M]. Beijing: Electronic Industry Press, 2018.
SAVASERE A, OMIECINSKI E R, NAVATHE S B. An efficient algorithm for mining association rules in large databases[R]. Georgia Institute of Technology, 1995.
PARK J S, CHEN M S, YU P S. An effective hash-based algorithm for mining association rules[J]. Acm sigmod record, 1995, 24(2): 175–186.
MANNILA H, TOIVONEN H, VERKAMO A I. Efficient algorithms for discovering association rules[C]//KDD-94: AAAI workshop on Knowledge Discovery in Databases. 1994: 181–192.
HAN J, PEI J, YIN Y. Mining frequent patterns without candidate generation[J]. ACM sigmod record, 2000, 29(2): 1–12.
PAWLAK Z. Rough sets[J]. International journal of computer & information sciences, 1982, 11(5): 341–356.
PAWLAK Z, GRZYMALA-BUSSE J, SLOWINSKI R, et al. Rough sets[J]. Communications of the ACM, 1995, 38(11): 88–95.
VIGER P F, LIN C W, KIRAN R U, et al. A survey of sequential pattern mining. Data Science and Pattern Recognition, 2017, 1(1): 54–77.
AGRAWAL R, SRIKANT R. Mining sequential patterns[C] // Proceedings of the eleventh international conference on data engineering. IEEE, 1995: 3–14.
WANG H, DING S. Research and Development of Sequential Pattern Mining [J]. Computer Science, 2009, 36(12): 14–17.
SRIKANT R, AGRAWAL R. Mining sequential patterns: Generalizations and performance improvements[C] //International conference on extending database technology. Springer, Berlin, Heidelberg, 1996: 1–17.
ZHANG M, KAO B, YIP C, et a1. A GSP-based efficient algorithm for mining frequent sequences[C] //Proc. of International Conference on Artificial Intelligence. Nevada, 2001.
MASSEGLIA F, CATHALA F, PONCELET P. The PSP approach for mining sequential patterns[C] //Proc. of the 2nd European. Symposium on Principles of Data Mining and Knowledge Discovery. Berlin: Springer-Verlag, 1998, 1510: 176–184.
ZAKI M J. SPADE: An efficient algorithm for mining frequent sequences[J]. Machine Learning, 2001, 41(1): 31–60.
HAN J, PEI J, MORTAZVI-ASL B, et a1. FreeSpan: frequent pattern projected sequential pattern mining[C] // Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM Press, 2000: 355–359.
HAN J, PEI J, MORTAZAVI-ASL B, et al. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth[C] //Proceedings of the 17th international conference on data engineering. Washington DC: IEEE Computer Society, 2001: 215–224.
LIN M Y, LEE S Y. Fast discovery of sequential patterns by memory indexing[C] // Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery. London UK: Springer-Verlag, 2002: 150–160.
SUI Y, SHAO F, SUN R, et al. A Sequential Pattern Mining Algorithm Based on Improved FP-tree, 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2008: 440–444.
INOKUCHI A, WASHIO T, MOTODA H. An apriori-based algorithm for mining frequent substructures from graph data[C] //European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, 2000: 13–23.
YAN X, HAN J. gSpan: Graph-based substructure pattern mining[C] //2002 IEEE International Conference on Data Mining, 2002. Proceedings. IEEE, 2002: 721–724.
HUAN J, WANG W, PRINS J. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. In 2003 IEEE International Conference on Data Mining, 2003: 549–552.
CHEN Q F, LAN C W, CHEN B S, et al. Exploring consensus RNA substructural patterns using subgraph mining. IEEE/ACM transactions on computational biology and bioinformatics. 2016, 14(5): 1134–1146.
VANETIK N, GUDES E, SHIMONY S E. Computing Frequent Graph Patterns from Semi-structured Data. In 2002 IEEE International Conference on Data Mining, 2002: 458–465.
HU H, YAN X, HUANG Y, et a1. Mining Coherent Dense Subgraphs across Massive Biological Networks for Functional Discovery. Bioinformatics, 2005, 21(1): 213–221.
FATTA G D, BERTHOLD M R. High Performance Subgraph Mining in Molecular Compounds[C] //International Conference on High Performance Computing and Communications, Sorrento, Italy, 2005: 866–877.
Zhang Wei. Research on Frequent Subgraph Mining Algorithm[D]. Yanshan University, 2011.
WASHIO T, MOTODA H. State of the art of graph based data mining[J]. Acm Sigkdd Explorations Newsletter, 2003, 5(1): 59–68.
COOK D J, HOLDER L B. Substructure discovery using minimum description length and background knowledge[J]. Journal of Artificial Intelligence Research, 1993: 231–255.
INOKUCHI A, WASHIN T, NISHIMURA K, et al. A Fast Algorithm for Mining Frequent Connected Subgraphs. IBM Research Report. 2002.
KURAMOCHI M, KARYPIS G. Frequent Subgraph Discovery. In Proceedings 2001 IEEE international conference on data mining, 2001: 313–320.
YAN X, HAN J. Closegraph: mining closed frequent graph patterns[C] //Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 2003: 286–295.
HUAN J, WANG W, PRINS J. Efficient mining of frequent subgraphs in the presence of isomorphism. Third IEEE international conference on data mining. 2003: 449–552.
SAVASERE A, OMIECINSKI E, NAVATHE S. Mining for strong negative associations in a large database of customer transactions[C] //Proceedings 14th International Conference on Data Engineering. IEEE, 1998: 494–502.
WU X, ZHANG C, ZHANG S. Mining both positive and negative association rules[C] //International Conference on Machine Learning. 2002, 2: 658–665.
ANTONIE M L, ZAÏANE O R. Mining positive and negative association rules: An approach for confined rules[C] //European Conference on Principles of Data Mining and Knowledge Discovery. 2004: 27–38.
ZHANG S C, CHEN F, WU X D. Identifying bridging rules between conceptual clusters[C] //Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 2006: 815–820.
MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013.
Ng P. dna2vec: Consistent vector representations of variable-length k-mers. arXiv preprint arXiv:1701.06279, 2017.
WANG Y, HOU Y, CHE W, et al. From static to dynamic word representations: a survey[J]. International Journal of Machine Learning and Cybernetics, 2020, 11(7): 1611–1630.
McClelland J L, Rumelhart D E, PDP Research Group. Parallel distributed processing[M]. Cambridge, MA: MIT press, 1986.
HUANG F, YATES A. Distributional representations for handling sparsity in supervised sequence-labeling[C] //Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 495–503.
DAGAN I, PEREIRA F, LEE L. Similarity-based estimation of word cooccurrence probabilities[J]. arXiv preprint cmp-lg/9405001, 1994.
DEERWESTER S, DUMAIS S T, FURNAS G W, et al. Indexing by latent semantic analysis[J]. Journal of the American society for information science. 1990, 41(6): 391–407.
BLEI D M, NG A, JORDAN M I. Latent dirichlet allocation[J]. The Journal of Machine Learning Research. 2003, 3: 993–1022.
BENGIO Y, DUCHARME R, VINCENT P. A neural probabilistic language model[J]. Advances in Neural Information Processing Systems, 2003: 1137–1155.
PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014: 1532–1543.
MCCANN B, BRADBURY J, XIONG C, et al. Learned in translation: contextualized word vectors. Advances in neural information processing systems. 2017, 30:6294–6305.
PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations. Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies. 2018, 1: 2227–2237.
Heinzinger M, Elnaggar A, Wang Y, et al. Modeling aspects of the language of life through transfer-learning protein sequences. BMC bioinformatics. 2019: 20(1):1–7.
DEVLIN J, CHANG M W, LEE K, et al. Bert: Pretraining of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT. 2019: 4171–4186.
PEARSON K.The problem of the random walk[J], Nature, 1905, 72(1865): 294–294.
PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: Bringing order to the web[R]. Stanford InfoLab, 1999.
PAN J Y, YANG H J, FALOUTSOS C, et al. Automatic multimedia cross-modal correlation discovery[C] //Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 2004: 653–658.
SHEN J, DU Y, WANG W, et al. Lazy random walks for superpixel segmentation[J]. IEEE Transactions on Image Processing, 2014, 23(4): 1451–1462.
LOVÁSZ L. Random walks on graphs: A survey, Combinatorics, Paul Erdos Eighty[J]. lecture notes in mathematics, 1993, 2(1): 1–46.
BRAND M. A random walks perspective on maximizing satisfaction and profit[C] //Proceedings of the 2005 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2005: 12–19.
GORI M, PUCCI A, ROMA V, et al. Itemrank: A random-walk based scoring algorithm for recommender engines[C]//International Joint Conference on Artificial Intelligence. 2007, 7: 2766–2771.
XIA F, LIU H F, LEE I, et al. Scientific article recommendation: Exploiting common author relations and historical preferences[J]. IEEE Transactions on Big Data, 2016, 2(2): 101–112.
LIU W P, LÜ L Y. Link prediction based on local random walk[J]. EPL (europhysics Letters), 2010, 89(5): 58007.
BACKSTROM L, LESKOVEC J. Supervised random walks: predicting and recommending links in social networks[C] //Proceedings of the fourth ACM international conference on Web search and data mining. 2011: 635–644.
SINGHAL A. Introducing the knowledge graph: things, not strings[Z]. Official Google Blog. 2012.
McCray AT. An upper-level ontology for the biomedical domain. Comparative and Functional genomics. 2003, 4(1): 80–4.
Li Danya, Hu Tiejun, Li Junlian, Qian Qing, Zhu Wenyan. Construction and Application of Chinese Integrated Medical Language System[J]. Journal of Information, 2011,30(02):147–151.
Aodema, Yang Yunfei, Sui Zhifang, etc. A Preliminary Study on the Construction of Chinese Medical Knowledge Graph CMeKG [J]. Chinese Journal of Information, 2019, 33(10): 1–9.
Guan S, Jin X, Jia Y, et al. Research Progress on Knowledge Reasoning Based on Knowledge Graph[J]. Journal of Software, 2018, 29(10): 2966–2994.
CHEN X J, JIA S B, XIANG Y. A review: Knowledge reasoning over knowledge graph[J]. Expert Systems with Applications, 2020, 141: 112948.
SCHOENMACKERS S, DAVIS J, ETZIONI O, et al. Learning first-order horn clauses from web text[C] //Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010: 1088–1098.
NAKASHOLE N, SOZIO M, SUCHANEK F M, et al. Query-time reasoning in uncertain RDF knowledge bases with soft and hard rules[J]. International Conference on Very Large Data Bases, 2012, 884: 15–20.
GALÁRRAGA L A, TEFLIOUDI C, HOSE K, et al. AMIE: association rule mining under incomplete evidence in ontological knowledge bases[C] //Proceedings of the 22nd international conference on World Wide Web. 2013: 413–422.
MITCHELL T, COHEN W, HRUSCHKA E, et al. Never-ending learning[J]. Communications of the ACM, 2018, 61(5): 103–115.
PAULHEIM H, BIZER C. Improving the quality of linked data using statistical distributions[J]. International Journal on Semantic Web and Information Systems (IJSWIS), 2014, 10(2): 63–86.
JANG S, MEGAWATI M, CHOI J, et al. Semi-automatic quality assessment of linked data without requiring ontology[C] //Proceedings of the Third NLP&DBpedia Workshop (NLP & DBpedia 2015) co-located with the 14th International Semantic Web Conference 2015 (ISWC 2015). 2015: 45–55.
WANG W Y, MAZAITIS K, COHEN W W. Programming with personalized pagerank: a locally groundable first-order probabilistic logic[C] //Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2013: 2129–2138.
CATHERINE R, COHEN W. Personalized recommendations using knowledge graphs: A probabilistic logic programming approach[C] //Proceedings of the 10th ACM conference on recommender systems. 2016: 325–332.
JIANG S P, LOWD D, DOU D J. Learning to refine an automatically extracted knowledge base using markov logic[C] //2012 IEEE 12th International Conference on Data Mining. 2012: 912–917.
CHEN Y, WANG D Z. Knowledge expansion over probabilistic knowledge bases[C] //Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 2014: 649–660.
KUŽELKA O, DAVIS J. Markov logic networks for knowledge base completion: A theoretical analysis under the MCAR assumption[C] //Uncertainty in Artificial Intelligence. PMLR, 2020: 1138–1148.
KIMMIG A, BACH S, BROECHELER M, et al. A short introduction to probabilistic soft logic[C]//Proceedings of the NIPS workshop on probabilistic programming: foundations and applications. 2012: 1–4.
NICKEL M, TRESP V, KRIEGEL H P. A three-way model for collective learning on multi-relational data[C] //International Conference on Machine Learning. 2011.
BORDES A, USUNIER N, GARCIA-DURAN A, et al. Translating embeddings for modeling multi-relational data[J]. Advances in neural information processing systems, 2013, 26.
BORDES A, GLOROT X, WESTON J, et al. Joint learning of words and meaning representations for open-text semantic parsing[C] //Artificial intelligence and statistics. PMLR, 2012: 127–135.
SOCHER R, CHEN D, MANNING C D, et al. Reasoning with neural tensor networks for knowledge base completion[J]. Advances in neural information processing systems, 2013, 26.
CHEN D, SOCHER R, MANNING C D, et al. Learning new facts from knowledge bases with neural tensor networks and semantic word vectors[J]. arXiv preprint arXiv:1301.3618, 2013.
SHI B, WENINGER T. Proje: Embedding projection for knowledge graph completion[C] //Proceedings of the AAAI Conference on Artificial Intelligence. 2017, 31(1).
LIU Q, JIANG H, EVDOKIMOV A, et al. Probabilistic reasoning via deep learning: Neural association models[J]. arXiv preprint arXiv:1603.07704, 2016.
WANG X, SONG X. Design of Network Security Vulnerability Type Correlation Analysis System Based on Knowledge Graph [J]. Electronic Design Engineering, 2021, 29(17):85–89.
GUO J. Research on Association Analysis Method of Aviation Safety Events Based on Knowledge Graph[D]. Civil Aviation University of China, 2020.
LI Y. Construction and application of knowledge graph for natural disaster emergency response [D]. Wuhan University, 2021.
LIU B. Research on Association Analysis Technology of Cyberspace Resources Based on Knowledge Graph [D]. Huazhong University of Science and Technology, 2019.
WANG W. Research on Association Analysis Technology of Distributed Security Events Based on Knowledge Graph [D]. National University of Defense Technology, 2018.
CHEN X. Research on Information Association Analysis Method Based on Knowledge Graph [D]. Harbin Engineering University, 2018.
Wu Jiamin. Construction and analysis of lung cancer medical knowledge map [D]. Ningxia University, 2019.
Nordon G, Koren G, Shalev V, et al. Separating wheat from chaff: Joining biomedical knowledge and patient data for repurposing medications[C] //Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 9565–9572.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2024 Guangxi Education Publishing House
About this chapter
Cite this chapter
Chen, Q. (2024). Association Analysis: Basic Concepts and Algorithms. In: Association Analysis Techniques and Applications in Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-99-8251-6_2
Download citation
DOI: https://doi.org/10.1007/978-981-99-8251-6_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8250-9
Online ISBN: 978-981-99-8251-6
eBook Packages: Computer ScienceComputer Science (R0)