Skip to main content

Association

  • Chapter
  • First Online:
Data Analytics

Abstract

In this final chapter, we are going to see the theoretical foundations of events Association analysis and the main techniques used to carry it out. As in all the previous chapters, it is structured in three subsections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Association analysis is also called in certain texts, especially statistics, as dependency analysis; and in data science texts, such as pattern search. The Dictionary of the Spanish Language [1] defines dependence as: “(//3. F. Relation of origin or connection.”

  2. 2.

    We repeat again here that it is very important in order to obtain the best results for the learning process throughout the use of this book, that the reader tries to solve the exercises by himself before seeing their solutions, and that only once solved check if the obtained solutions are correct.

  3. 3.

    There are others as the Lift.

  4. 4.

    In order not to repeat throughout the text the terminology: “search for patterns or association of disjoint events,” from here on we will only use “association of events” as it is the most widely used terminology.

  5. 5.

    It is very important to bear in mind that the definition of probability that will be used in association studies will be the classic one and from here on, whenever we refer to probability, it will be the classic one.

  6. 6.

    The equation is written for mxn tables, that is, when the two subsets of the sample space are formed by n exclusive events, the first, and m exclusive events, the second.

  7. 7.

    Traditionally in textbooks, contingency has been described on the concept of characteristics and their possible values. Defining on the basis of elementary events is new to this book.

  8. 8.

    For example, the PHI contingency coefficient.

  9. 9.

    The contingency coefficient can also be used to determine the probability of dependence of the two characteristics through Pearson’s χ2 distribution function, such that for a value of χ2 with ν degrees of freedom and for a significance level α the characteristics will be considered independent when χ2 < χα, υ2

  10. 10.

    Disjoint events are those that do not have any elemental events in common.

  11. 11.

    In order not to repeat throughout the text the terminology: “search for patterns or association of disjoint events,” from here on we will only use “association of events” as it is the most widely used terminology.

  12. 12.

    It is very important to bear in mind that the definition of probability that will be used in association studies will be the classic one and from here on, whenever we refer to probability, it will be the classic one.

  13. 13.

    We use the usual example of a shopping basket because it seems very pedagogical, although we will only coincide with other texts in the domain, since the rest of the discussion will have a quite different approach.

  14. 14.

    We write here a sense because, as explained above, for confidence it is essential to indicate the sense of the association and in this case what is being analysed is the trust of the association between the events Bread, Water, and Milk, but from the perspective of knowing what association there is the appearance of Bread and Water with Milk, this is to what degree when you have Bread and Water you have or not Milk.

  15. 15.

    We write here a footnote because, as explained above, for confidence it is essential to indicate the sense of the association and in this case what is being analysed is the confidence of the association between the events Bread, Water, and Milk, but from the perspective of knowing what association there is the appearance of.

  16. 16.

    Although not all the existing ones will be seen, since only such a length could be covered in a monographic text on association, the most widely used and internationally disseminated algorithm will be seen.

  17. 17.

    We will call them A and B for clarity.

  18. 18.

    That, following what has been seen above, we will also call here, for reasons of clarity, A and B

  19. 19.

    A function – here we will take the support function s, because they are the one we are working on, but it could be any other – is antimonotonic on a set P (E) when it verifies that

    $$ \forall \mathrm{A},\mathrm{B}\in \mathrm{P}\left(\mathrm{E}\right)/\mathrm{A}\subseteq \mathrm{B}\to \mathrm{s}\ \left(\mathrm{B}\right)\le \mathrm{s}\ \left(\mathrm{A}\right) $$
  20. 20.

    We repeat the sample here so as not to have to go back and make reading easier: {Bread, Water, Milk, Oranges}, {Bread, Water, Coffee, Milk}, {Bread, Water, Milk}, {Bread, Coffee, Milk}, {Bread, Water}, {Milk}.

  21. 21.

    The explanation of how a hash tree is built will be seen through the example because it carries greater clarity.

  22. 22.

    All the data are real data.

  23. 23.

    The name of the planet will only be used to identify the event, but will not be considered as an additional variable.

  24. 24.

    Only Without or With will be put, without the word satellites to make the text easier to read.

  25. 25.

    We changed the numbering when naming the events for clarity. Event numbering is arbitrary and can be changed at any time.

  26. 26.

    Do not forget that this event appears twice.

  27. 27.

    Without separation between the name “Event” and the number of the event.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cuadrado-Gallego, J.J., Demchenko, Y. (2023). Association. In: Data Analytics. Springer, Cham. https://doi.org/10.1007/978-3-031-39129-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39129-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39128-6

  • Online ISBN: 978-3-031-39129-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics