Abstract
Data mining, or knowledge discovery in databases, provides the tools to sift through the vast data stores to find the trends, patterns, and correlations that can guide strategic decision-making. The chapter highlights the major applications of data mining, their perspectives, goals of data mining, evolution of data mining algorithms—for transaction data, data streams, graph, and text-based data—and classes of data mining algorithms—prediction methods, clustering, and association rules. This is followed with cluster analysis, components of clustering task, pattern representation and feature extraction, similarity measures, and partitional algorithms. Data classification methods like decision trees and association rule mining are presented with worked examples. Sequential pattern mining algorithms are presented with typical pattern mining and worked examples. The chapter concludes with scientific applications of data mining, chapter summary, and list of practice exercises.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fayyad U, Uthurysamy R (2002) Evolving data mining into solutions for insights. Commun ACM 45(8):28–31
Han J et al (1999) Constraint-based multidimensional data mining. Computer 4:46–50
Ramakrishnan N, Ananth YG (1999) Data mining: from serendipity to science. Computer 8:34–37
Smyth P et al (2002) Data-driven evolution of data mining algorithms. Commun ACM 45(8):33–37
Bradley P et al (2002) Scaling mining algorithms to large databases. Commun ACM 45(8):38–43
Karpis George et al (1999) Hierarchical clustering using dynamic modelling. Computer 4:68–75
Jain AK et al (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Ganti V et al (1999) Mining very large databases. Computer 8:38–45
Ceglar A, Roddick JF (2006) Association mining. ACM Comput Surv 38(2):1–46
Nizar R et al (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43:1:3.1–3.41
Carl H et al (2013) Sequential pattern mining—approaches and algorithms. ACM Comput Surv 45(2):19:1–19:39
Han J et al (2002) Emerging scientific applications in data mining. Commun ACM 45(8):54–58
Author information
Authors and Affiliations
Corresponding author
Exercises
Exercises
-
1.
How is data mining different from querying databases, like Oracle or MySql?
-
2.
What were the trends in Information Technology (IT), which gave birth to the field of data mining? Why did the field of data mining emerge so late compared to databases?
-
3.
Are the goals presented in this chapter identically applicable to all the data mining domains, for example, mining of data generated due to collision in particle accelerators versus online sales transactions versus Twitter data? Justify your answer.
-
4.
Given that the Apriori algorithm makes use of prior knowledge of subset support properties,
-
a.
Show that all non-empty subsets of a set of frequent items must also be frequent.
-
b.
Show that the support of any non-empty subset R of itemset S must be at least as large as the support of S.
-
a.
-
5.
The algorithms for frequent patterns mining consider only distinct items in a transaction (market basket or shopping basket). However, the multiple occurrences of an item are common in a shopping basket, e.g., we often buy the things like 10 eggs, 3 breads, 2 kg dalia, 4 kg oil, 2 kg milk, etc., and this can be important in transaction data analysis. Suggest an approach on how you will modify the Apriori algorithm, or propose alternate method to efficiently consider multiple occurrences of items?
-
6.
Assume nonnegative prices of items in a store, and find out the nature of constraint they represent in each of the following cases. Also suggest, how you will mine the association rules in these.
-
a.
At least one Sudoku game.
-
b.
Itemsets, whose sum of prices is less than \(\$250\).
-
c.
There is one free item, and other items, whose sum of prices is equal or more than \(\$300\).
-
d.
The average price of all the items is between \(\$200\) and \(\$500\).
-
a.
-
7.
Given a decision tree, you have the following options:
-
a.
Convert the decision tree into rules, and then prune the resulting rules,
-
b.
Prune the decision tree and then convert the pruned tree into rules.
Critically analyze both the approaches, and discuss their merits and demerits.
-
a.
-
8.
Find out the worst case time complexity of decision-tree algorithm. Assume that data set is D, each item has n number of attributes, and the number of training tuples are |D|. (Hint. Answer is \(|D|.n. \log |D|\).)
-
9.
It is required to cluster a given set of data into three clusters, where (x, y) represent the location of the object, and the distance function is Euclidean distance. The points are \(P_1(3, 12),\) \(P_2(3, 8),\) \(P_3(9, 5),\) \(Q_1(4, 7),\) \(Q_2(6, 4),\) \(Q_3(7, 5),\) \(R_1(2, 3),\) and \(R_2(5, 8)\). Use the k-means algorithm to show the three cluster centers after the first round of execution, and after the final round of execution.
Rights and permissions
Copyright information
© 2020 Springer Nature India Private Limited
About this chapter
Cite this chapter
Chowdhary, K.R. (2020). Data Mining. In: Fundamentals of Artificial Intelligence. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3972-7_17
Download citation
DOI: https://doi.org/10.1007/978-81-322-3972-7_17
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-3970-3
Online ISBN: 978-81-322-3972-7
eBook Packages: Computer ScienceComputer Science (R0)