Skip to main content

Data Mining

  • Chapter
  • First Online:
Fundamentals of Artificial Intelligence

Abstract

Data mining, or knowledge discovery in databases, provides the tools to sift through the vast data stores to find the trends, patterns, and correlations that can guide strategic decision-making. The chapter highlights the major applications of data mining, their perspectives, goals of data mining, evolution of data mining algorithms—for transaction data, data streams, graph, and text-based data—and classes of data mining algorithms—prediction methods, clustering, and association rules. This is followed with cluster analysis, components of clustering task, pattern representation and feature extraction, similarity measures, and partitional algorithms. Data classification methods like decision trees and association rule mining are presented with worked examples. Sequential pattern mining algorithms are presented with typical pattern mining and worked examples. The chapter concludes with scientific applications of data mining, chapter summary, and list of practice exercises.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fayyad U, Uthurysamy R (2002) Evolving data mining into solutions for insights. Commun ACM 45(8):28–31

    Article  Google Scholar 

  2. Han J et al (1999) Constraint-based multidimensional data mining. Computer 4:46–50

    Google Scholar 

  3. Ramakrishnan N, Ananth YG (1999) Data mining: from serendipity to science. Computer 8:34–37

    Article  Google Scholar 

  4. Smyth P et al (2002) Data-driven evolution of data mining algorithms. Commun ACM 45(8):33–37

    Article  Google Scholar 

  5. Bradley P et al (2002) Scaling mining algorithms to large databases. Commun ACM 45(8):38–43

    Article  Google Scholar 

  6. Karpis George et al (1999) Hierarchical clustering using dynamic modelling. Computer 4:68–75

    Article  Google Scholar 

  7. Jain AK et al (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  8. Ganti V et al (1999) Mining very large databases. Computer 8:38–45

    Article  Google Scholar 

  9. Ceglar A, Roddick JF (2006) Association mining. ACM Comput Surv 38(2):1–46

    Article  Google Scholar 

  10. Nizar R et al (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43:1:3.1–3.41

    Google Scholar 

  11. Carl H et al (2013) Sequential pattern mining—approaches and algorithms. ACM Comput Surv 45(2):19:1–19:39

    Google Scholar 

  12. Han J et al (2002) Emerging scientific applications in data mining. Commun ACM 45(8):54–58

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. R. Chowdhary .

Exercises

Exercises

  1. 1.

    How is data mining different from querying databases, like Oracle or MySql?

  2. 2.

    What were the trends in Information Technology (IT), which gave birth to the field of data mining? Why did the field of data mining emerge so late compared to databases?

  3. 3.

    Are the goals presented in this chapter identically applicable to all the data mining domains, for example, mining of data generated due to collision in particle accelerators versus online sales transactions versus Twitter data? Justify your answer.

  4. 4.

    Given that the Apriori algorithm makes use of prior knowledge of subset support properties,

    1. a.

      Show that all non-empty subsets of a set of frequent items must also be frequent.

    2. b.

      Show that the support of any non-empty subset R of itemset S must be at least as large as the support of S.

  5. 5.

    The algorithms for frequent patterns mining consider only distinct items in a transaction (market basket or shopping basket). However, the multiple occurrences of an item are common in a shopping basket, e.g., we often buy the things like 10 eggs, 3 breads, 2 kg dalia, 4 kg oil, 2 kg milk, etc., and this can be important in transaction data analysis. Suggest an approach on how you will modify the Apriori algorithm, or propose alternate method to efficiently consider multiple occurrences of items?

  6. 6.

    Assume nonnegative prices of items in a store, and find out the nature of constraint they represent in each of the following cases. Also suggest, how you will mine the association rules in these.

    1. a.

      At least one Sudoku game.

    2. b.

      Itemsets, whose sum of prices is less than \(\$250\).

    3. c.

      There is one free item, and other items, whose sum of prices is equal or more than \(\$300\).

    4. d.

      The average price of all the items is between \(\$200\) and \(\$500\).

  7. 7.

    Given a decision tree, you have the following options:

    1. a.

      Convert the decision tree into rules, and then prune the resulting rules,

    2. b.

      Prune the decision tree and then convert the pruned tree into rules.

    Critically analyze both the approaches, and discuss their merits and demerits.

  8. 8.

    Find out the worst case time complexity of decision-tree algorithm. Assume that data set is D, each item has n number of attributes, and the number of training tuples are |D|. (Hint. Answer is \(|D|.n. \log |D|\).)

  9. 9.

    It is required to cluster a given set of data into three clusters, where (xy) represent the location of the object, and the distance function is Euclidean distance. The points are \(P_1(3, 12),\) \(P_2(3, 8),\) \(P_3(9, 5),\) \(Q_1(4, 7),\) \(Q_2(6, 4),\) \(Q_3(7, 5),\) \(R_1(2, 3),\) and \(R_2(5, 8)\). Use the k-means algorithm to show the three cluster centers after the first round of execution, and after the final round of execution.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature India Private Limited

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chowdhary, K.R. (2020). Data Mining. In: Fundamentals of Artificial Intelligence. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3972-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-3972-7_17

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-3970-3

  • Online ISBN: 978-81-322-3972-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics