Skip to main content

Association Analysis: Basic Concepts and Algorithms

  • Chapter
  • First Online:
Association Analysis Techniques and Applications in Bioinformatics

Abstract

In 1993, Agrawal et al. pioneered the theory of mining association rules from large database, which is used to identify interesting links between items in market basket data transactions. Market basket transaction is a typical example of the application of association analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Random walk describes a path consisting of a series of random steps in a certain mathematical space. See Sect. 2.7 for details.

References

  1. AGRAWAL R, IMIELIŃSKI T, Swami A. Mining association rules between sets of items in large databases[C] //Proceedings of the 1993 ACM SIGMOD international conference on Management of data. 1993: 207–216.

    Google Scholar 

  2. AGRAWAL R, SRIKANT R, et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, 1994, 1215:487–499.

    Google Scholar 

  3. HAN J W, KAMBER M, Pei J. Data Mining: Concepts and Techniques (3rd Edition) [M]. Translated by Fan Ming, Meng Xiaofeng. Beijing: Machinery Industry Press, 2012.

    Google Scholar 

  4. TAN P N, STEINBACH M, KUMAR V. Introduction to Data Mining (Full Version) [M]. Fan Ming, translated by Fan Hongjian. Beijing: People’s Posts and Telecommunications Press, 2021.

    Google Scholar 

  5. LIU P, ZHANG Y. Data Mining[M]. Beijing: Electronic Industry Press, 2018.

    Google Scholar 

  6. SAVASERE A, OMIECINSKI E R, NAVATHE S B. An efficient algorithm for mining association rules in large databases[R]. Georgia Institute of Technology, 1995.

    Google Scholar 

  7. PARK J S, CHEN M S, YU P S. An effective hash-based algorithm for mining association rules[J]. Acm sigmod record, 1995, 24(2): 175–186.

    MathSciNet  Google Scholar 

  8. MANNILA H, TOIVONEN H, VERKAMO A I. Efficient algorithms for discovering association rules[C]//KDD-94: AAAI workshop on Knowledge Discovery in Databases. 1994: 181–192.

    Google Scholar 

  9. HAN J, PEI J, YIN Y. Mining frequent patterns without candidate generation[J]. ACM sigmod record, 2000, 29(2): 1–12.

    Google Scholar 

  10. PAWLAK Z. Rough sets[J]. International journal of computer & information sciences, 1982, 11(5): 341–356.

    MathSciNet  Google Scholar 

  11. PAWLAK Z, GRZYMALA-BUSSE J, SLOWINSKI R, et al. Rough sets[J]. Communications of the ACM, 1995, 38(11): 88–95.

    Google Scholar 

  12. VIGER P F, LIN C W, KIRAN R U, et al. A survey of sequential pattern mining. Data Science and Pattern Recognition, 2017, 1(1): 54–77.

    Google Scholar 

  13. AGRAWAL R, SRIKANT R. Mining sequential patterns[C] // Proceedings of the eleventh international conference on data engineering. IEEE, 1995: 3–14.

    Google Scholar 

  14. WANG H, DING S. Research and Development of Sequential Pattern Mining [J]. Computer Science, 2009, 36(12): 14–17.

    Google Scholar 

  15. SRIKANT R, AGRAWAL R. Mining sequential patterns: Generalizations and performance improvements[C] //International conference on extending database technology. Springer, Berlin, Heidelberg, 1996: 1–17.

    Google Scholar 

  16. ZHANG M, KAO B, YIP C, et a1. A GSP-based efficient algorithm for mining frequent sequences[C] //Proc. of International Conference on Artificial Intelligence. Nevada, 2001.

    Google Scholar 

  17. MASSEGLIA F, CATHALA F, PONCELET P. The PSP approach for mining sequential patterns[C] //Proc. of the 2nd European. Symposium on Principles of Data Mining and Knowledge Discovery. Berlin: Springer-Verlag, 1998, 1510: 176–184.

    Google Scholar 

  18. ZAKI M J. SPADE: An efficient algorithm for mining frequent sequences[J]. Machine Learning, 2001, 41(1): 31–60.

    Google Scholar 

  19. HAN J, PEI J, MORTAZVI-ASL B, et a1. FreeSpan: frequent pattern projected sequential pattern mining[C] // Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM Press, 2000: 355–359.

    Google Scholar 

  20. HAN J, PEI J, MORTAZAVI-ASL B, et al. PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth[C] //Proceedings of the 17th international conference on data engineering. Washington DC: IEEE Computer Society, 2001: 215–224.

    Google Scholar 

  21. LIN M Y, LEE S Y. Fast discovery of sequential patterns by memory indexing[C] // Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery. London UK: Springer-Verlag, 2002: 150–160.

    Google Scholar 

  22. SUI Y, SHAO F, SUN R, et al. A Sequential Pattern Mining Algorithm Based on Improved FP-tree, 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2008: 440–444.

    Google Scholar 

  23. INOKUCHI A, WASHIO T, MOTODA H. An apriori-based algorithm for mining frequent substructures from graph data[C] //European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg, 2000: 13–23.

    Google Scholar 

  24. YAN X, HAN J. gSpan: Graph-based substructure pattern mining[C] //2002 IEEE International Conference on Data Mining, 2002. Proceedings. IEEE, 2002: 721–724.

    Google Scholar 

  25. HUAN J, WANG W, PRINS J. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. In 2003 IEEE International Conference on Data Mining, 2003: 549–552.

    Google Scholar 

  26. CHEN Q F, LAN C W, CHEN B S, et al. Exploring consensus RNA substructural patterns using subgraph mining. IEEE/ACM transactions on computational biology and bioinformatics. 2016, 14(5): 1134–1146.

    Google Scholar 

  27. VANETIK N, GUDES E, SHIMONY S E. Computing Frequent Graph Patterns from Semi-structured Data. In 2002 IEEE International Conference on Data Mining, 2002: 458–465.

    Google Scholar 

  28. HU H, YAN X, HUANG Y, et a1. Mining Coherent Dense Subgraphs across Massive Biological Networks for Functional Discovery. Bioinformatics, 2005, 21(1): 213–221.

    Google Scholar 

  29. FATTA G D, BERTHOLD M R. High Performance Subgraph Mining in Molecular Compounds[C] //International Conference on High Performance Computing and Communications, Sorrento, Italy, 2005: 866–877.

    Google Scholar 

  30. Zhang Wei. Research on Frequent Subgraph Mining Algorithm[D]. Yanshan University, 2011.

    Google Scholar 

  31. WASHIO T, MOTODA H. State of the art of graph based data mining[J]. Acm Sigkdd Explorations Newsletter, 2003, 5(1): 59–68.

    Google Scholar 

  32. COOK D J, HOLDER L B. Substructure discovery using minimum description length and background knowledge[J]. Journal of Artificial Intelligence Research, 1993: 231–255.

    Google Scholar 

  33. INOKUCHI A, WASHIN T, NISHIMURA K, et al. A Fast Algorithm for Mining Frequent Connected Subgraphs. IBM Research Report. 2002.

    Google Scholar 

  34. KURAMOCHI M, KARYPIS G. Frequent Subgraph Discovery. In Proceedings 2001 IEEE international conference on data mining, 2001: 313–320.

    Google Scholar 

  35. YAN X, HAN J. Closegraph: mining closed frequent graph patterns[C] //Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 2003: 286–295.

    Google Scholar 

  36. HUAN J, WANG W, PRINS J. Efficient mining of frequent subgraphs in the presence of isomorphism. Third IEEE international conference on data mining. 2003: 449–552.

    Google Scholar 

  37. SAVASERE A, OMIECINSKI E, NAVATHE S. Mining for strong negative associations in a large database of customer transactions[C] //Proceedings 14th International Conference on Data Engineering. IEEE, 1998: 494–502.

    Google Scholar 

  38. WU X, ZHANG C, ZHANG S. Mining both positive and negative association rules[C] //International Conference on Machine Learning. 2002, 2: 658–665.

    Google Scholar 

  39. ANTONIE M L, ZAÏANE O R. Mining positive and negative association rules: An approach for confined rules[C] //European Conference on Principles of Data Mining and Knowledge Discovery. 2004: 27–38.

    Google Scholar 

  40. ZHANG S C, CHEN F, WU X D. Identifying bridging rules between conceptual clusters[C] //Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 2006: 815–820.

    Google Scholar 

  41. MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013.

    Google Scholar 

  42. Ng P. dna2vec: Consistent vector representations of variable-length k-mers. arXiv preprint arXiv:1701.06279, 2017.

    Google Scholar 

  43. WANG Y, HOU Y, CHE W, et al. From static to dynamic word representations: a survey[J]. International Journal of Machine Learning and Cybernetics, 2020, 11(7): 1611–1630.

    Google Scholar 

  44. McClelland J L, Rumelhart D E, PDP Research Group. Parallel distributed processing[M]. Cambridge, MA: MIT press, 1986.

    Google Scholar 

  45. HUANG F, YATES A. Distributional representations for handling sparsity in supervised sequence-labeling[C] //Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 495–503.

    Google Scholar 

  46. DAGAN I, PEREIRA F, LEE L. Similarity-based estimation of word cooccurrence probabilities[J]. arXiv preprint cmp-lg/9405001, 1994.

    Google Scholar 

  47. DEERWESTER S, DUMAIS S T, FURNAS G W, et al. Indexing by latent semantic analysis[J]. Journal of the American society for information science. 1990, 41(6): 391–407.

    Google Scholar 

  48. BLEI D M, NG A, JORDAN M I. Latent dirichlet allocation[J]. The Journal of Machine Learning Research. 2003, 3: 993–1022.

    Google Scholar 

  49. BENGIO Y, DUCHARME R, VINCENT P. A neural probabilistic language model[J]. Advances in Neural Information Processing Systems, 2003: 1137–1155.

    Google Scholar 

  50. PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014: 1532–1543.

    Google Scholar 

  51. MCCANN B, BRADBURY J, XIONG C, et al. Learned in translation: contextualized word vectors. Advances in neural information processing systems. 2017, 30:6294–6305.

    Google Scholar 

  52. PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations. Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies. 2018, 1: 2227–2237.

    Google Scholar 

  53. Heinzinger M, Elnaggar A, Wang Y, et al. Modeling aspects of the language of life through transfer-learning protein sequences. BMC bioinformatics. 2019: 20(1):1–7.

    Google Scholar 

  54. DEVLIN J, CHANG M W, LEE K, et al. Bert: Pretraining of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT. 2019: 4171–4186.

    Google Scholar 

  55. PEARSON K.The problem of the random walk[J], Nature, 1905, 72(1865): 294–294.

    Google Scholar 

  56. PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: Bringing order to the web[R]. Stanford InfoLab, 1999.

    Google Scholar 

  57. PAN J Y, YANG H J, FALOUTSOS C, et al. Automatic multimedia cross-modal correlation discovery[C] //Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 2004: 653–658.

    Google Scholar 

  58. SHEN J, DU Y, WANG W, et al. Lazy random walks for superpixel segmentation[J]. IEEE Transactions on Image Processing, 2014, 23(4): 1451–1462.

    Google Scholar 

  59. LOVÁSZ L. Random walks on graphs: A survey, Combinatorics, Paul Erdos Eighty[J]. lecture notes in mathematics, 1993, 2(1): 1–46.

    Google Scholar 

  60. BRAND M. A random walks perspective on maximizing satisfaction and profit[C] //Proceedings of the 2005 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2005: 12–19.

    Google Scholar 

  61. GORI M, PUCCI A, ROMA V, et al. Itemrank: A random-walk based scoring algorithm for recommender engines[C]//International Joint Conference on Artificial Intelligence. 2007, 7: 2766–2771.

    Google Scholar 

  62. XIA F, LIU H F, LEE I, et al. Scientific article recommendation: Exploiting common author relations and historical preferences[J]. IEEE Transactions on Big Data, 2016, 2(2): 101–112.

    Google Scholar 

  63. LIU W P, LÜ L Y. Link prediction based on local random walk[J]. EPL (europhysics Letters), 2010, 89(5): 58007.

    Google Scholar 

  64. BACKSTROM L, LESKOVEC J. Supervised random walks: predicting and recommending links in social networks[C] //Proceedings of the fourth ACM international conference on Web search and data mining. 2011: 635–644.

    Google Scholar 

  65. SINGHAL A. Introducing the knowledge graph: things, not strings[Z]. Official Google Blog. 2012.

    Google Scholar 

  66. McCray AT. An upper-level ontology for the biomedical domain. Comparative and Functional genomics. 2003, 4(1): 80–4.

    Google Scholar 

  67. Li Danya, Hu Tiejun, Li Junlian, Qian Qing, Zhu Wenyan. Construction and Application of Chinese Integrated Medical Language System[J]. Journal of Information, 2011,30(02):147–151.

    Google Scholar 

  68. Aodema, Yang Yunfei, Sui Zhifang, etc. A Preliminary Study on the Construction of Chinese Medical Knowledge Graph CMeKG [J]. Chinese Journal of Information, 2019, 33(10): 1–9.

    Google Scholar 

  69. Guan S, Jin X, Jia Y, et al. Research Progress on Knowledge Reasoning Based on Knowledge Graph[J]. Journal of Software, 2018, 29(10): 2966–2994.

    MathSciNet  Google Scholar 

  70. CHEN X J, JIA S B, XIANG Y. A review: Knowledge reasoning over knowledge graph[J]. Expert Systems with Applications, 2020, 141: 112948.

    Google Scholar 

  71. SCHOENMACKERS S, DAVIS J, ETZIONI O, et al. Learning first-order horn clauses from web text[C] //Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010: 1088–1098.

    Google Scholar 

  72. NAKASHOLE N, SOZIO M, SUCHANEK F M, et al. Query-time reasoning in uncertain RDF knowledge bases with soft and hard rules[J]. International Conference on Very Large Data Bases, 2012, 884: 15–20.

    Google Scholar 

  73. GALÁRRAGA L A, TEFLIOUDI C, HOSE K, et al. AMIE: association rule mining under incomplete evidence in ontological knowledge bases[C] //Proceedings of the 22nd international conference on World Wide Web. 2013: 413–422.

    Google Scholar 

  74. MITCHELL T, COHEN W, HRUSCHKA E, et al. Never-ending learning[J]. Communications of the ACM, 2018, 61(5): 103–115.

    Google Scholar 

  75. PAULHEIM H, BIZER C. Improving the quality of linked data using statistical distributions[J]. International Journal on Semantic Web and Information Systems (IJSWIS), 2014, 10(2): 63–86.

    Google Scholar 

  76. JANG S, MEGAWATI M, CHOI J, et al. Semi-automatic quality assessment of linked data without requiring ontology[C] //Proceedings of the Third NLP&DBpedia Workshop (NLP & DBpedia 2015) co-located with the 14th International Semantic Web Conference 2015 (ISWC 2015). 2015: 45–55.

    Google Scholar 

  77. WANG W Y, MAZAITIS K, COHEN W W. Programming with personalized pagerank: a locally groundable first-order probabilistic logic[C] //Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2013: 2129–2138.

    Google Scholar 

  78. CATHERINE R, COHEN W. Personalized recommendations using knowledge graphs: A probabilistic logic programming approach[C] //Proceedings of the 10th ACM conference on recommender systems. 2016: 325–332.

    Google Scholar 

  79. JIANG S P, LOWD D, DOU D J. Learning to refine an automatically extracted knowledge base using markov logic[C] //2012 IEEE 12th International Conference on Data Mining. 2012: 912–917.

    Google Scholar 

  80. CHEN Y, WANG D Z. Knowledge expansion over probabilistic knowledge bases[C] //Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 2014: 649–660.

    Google Scholar 

  81. KUŽELKA O, DAVIS J. Markov logic networks for knowledge base completion: A theoretical analysis under the MCAR assumption[C] //Uncertainty in Artificial Intelligence. PMLR, 2020: 1138–1148.

    Google Scholar 

  82. KIMMIG A, BACH S, BROECHELER M, et al. A short introduction to probabilistic soft logic[C]//Proceedings of the NIPS workshop on probabilistic programming: foundations and applications. 2012: 1–4.

    Google Scholar 

  83. NICKEL M, TRESP V, KRIEGEL H P. A three-way model for collective learning on multi-relational data[C] //International Conference on Machine Learning. 2011.

    Google Scholar 

  84. BORDES A, USUNIER N, GARCIA-DURAN A, et al. Translating embeddings for modeling multi-relational data[J]. Advances in neural information processing systems, 2013, 26.

    Google Scholar 

  85. BORDES A, GLOROT X, WESTON J, et al. Joint learning of words and meaning representations for open-text semantic parsing[C] //Artificial intelligence and statistics. PMLR, 2012: 127–135.

    Google Scholar 

  86. SOCHER R, CHEN D, MANNING C D, et al. Reasoning with neural tensor networks for knowledge base completion[J]. Advances in neural information processing systems, 2013, 26.

    Google Scholar 

  87. CHEN D, SOCHER R, MANNING C D, et al. Learning new facts from knowledge bases with neural tensor networks and semantic word vectors[J]. arXiv preprint arXiv:1301.3618, 2013.

    Google Scholar 

  88. SHI B, WENINGER T. Proje: Embedding projection for knowledge graph completion[C] //Proceedings of the AAAI Conference on Artificial Intelligence. 2017, 31(1).

    Google Scholar 

  89. LIU Q, JIANG H, EVDOKIMOV A, et al. Probabilistic reasoning via deep learning: Neural association models[J]. arXiv preprint arXiv:1603.07704, 2016.

    Google Scholar 

  90. WANG X, SONG X. Design of Network Security Vulnerability Type Correlation Analysis System Based on Knowledge Graph [J]. Electronic Design Engineering, 2021, 29(17):85–89.

    MathSciNet  Google Scholar 

  91. GUO J. Research on Association Analysis Method of Aviation Safety Events Based on Knowledge Graph[D]. Civil Aviation University of China, 2020.

    Google Scholar 

  92. LI Y. Construction and application of knowledge graph for natural disaster emergency response [D]. Wuhan University, 2021.

    Google Scholar 

  93. LIU B. Research on Association Analysis Technology of Cyberspace Resources Based on Knowledge Graph [D]. Huazhong University of Science and Technology, 2019.

    Google Scholar 

  94. WANG W. Research on Association Analysis Technology of Distributed Security Events Based on Knowledge Graph [D]. National University of Defense Technology, 2018.

    Google Scholar 

  95. CHEN X. Research on Information Association Analysis Method Based on Knowledge Graph [D]. Harbin Engineering University, 2018.

    Google Scholar 

  96. Wu Jiamin. Construction and analysis of lung cancer medical knowledge map [D]. Ningxia University, 2019.

    Google Scholar 

  97. Nordon G, Koren G, Shalev V, et al. Separating wheat from chaff: Joining biomedical knowledge and patient data for repurposing medications[C] //Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 9565–9572.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2024 Guangxi Education Publishing House

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chen, Q. (2024). Association Analysis: Basic Concepts and Algorithms. In: Association Analysis Techniques and Applications in Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-99-8251-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8251-6_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8250-9

  • Online ISBN: 978-981-99-8251-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics