Data Mining

Chowdhary, K. R.

doi:10.1007/978-81-322-3972-7_17

K. R. Chowdhary²

12k Accesses
1 Citations

Abstract

Data mining, or knowledge discovery in databases, provides the tools to sift through the vast data stores to find the trends, patterns, and correlations that can guide strategic decision-making. The chapter highlights the major applications of data mining, their perspectives, goals of data mining, evolution of data mining algorithms—for transaction data, data streams, graph, and text-based data—and classes of data mining algorithms—prediction methods, clustering, and association rules. This is followed with cluster analysis, components of clustering task, pattern representation and feature extraction, similarity measures, and partitional algorithms. Data classification methods like decision trees and association rule mining are presented with worked examples. Sequential pattern mining algorithms are presented with typical pattern mining and worked examples. The chapter concludes with scientific applications of data mining, chapter summary, and list of practice exercises.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fayyad U, Uthurysamy R (2002) Evolving data mining into solutions for insights. Commun ACM 45(8):28–31
Article Google Scholar
Han J et al (1999) Constraint-based multidimensional data mining. Computer 4:46–50
Google Scholar
Ramakrishnan N, Ananth YG (1999) Data mining: from serendipity to science. Computer 8:34–37
Article Google Scholar
Smyth P et al (2002) Data-driven evolution of data mining algorithms. Commun ACM 45(8):33–37
Article Google Scholar
Bradley P et al (2002) Scaling mining algorithms to large databases. Commun ACM 45(8):38–43
Article Google Scholar
Karpis George et al (1999) Hierarchical clustering using dynamic modelling. Computer 4:68–75
Article Google Scholar
Jain AK et al (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Ganti V et al (1999) Mining very large databases. Computer 8:38–45
Article Google Scholar
Ceglar A, Roddick JF (2006) Association mining. ACM Comput Surv 38(2):1–46
Article Google Scholar
Nizar R et al (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43:1:3.1–3.41
Google Scholar
Carl H et al (2013) Sequential pattern mining—approaches and algorithms. ACM Comput Surv 45(2):19:1–19:39
Google Scholar
Han J et al (2002) Emerging scientific applications in data mining. Commun ACM 45(8):54–58
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jodhpur Institute of Engineering and Technology, Jodhpur, Rajasthan, India
K. R. Chowdhary

Authors

K. R. Chowdhary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. R. Chowdhary .

Exercises

1.
How is data mining different from querying databases, like Oracle or MySql?
2.
What were the trends in Information Technology (IT), which gave birth to the field of data mining? Why did the field of data mining emerge so late compared to databases?
3.
Are the goals presented in this chapter identically applicable to all the data mining domains, for example, mining of data generated due to collision in particle accelerators versus online sales transactions versus Twitter data? Justify your answer.
4.
Given that the Apriori algorithm makes use of prior knowledge of subset support properties,
1. a.
  Show that all non-empty subsets of a set of frequent items must also be frequent.
2. b.
  Show that the support of any non-empty subset R of itemset S must be at least as large as the support of S.
5.
The algorithms for frequent patterns mining consider only distinct items in a transaction (market basket or shopping basket). However, the multiple occurrences of an item are common in a shopping basket, e.g., we often buy the things like 10 eggs, 3 breads, 2 kg dalia, 4 kg oil, 2 kg milk, etc., and this can be important in transaction data analysis. Suggest an approach on how you will modify the Apriori algorithm, or propose alternate method to efficiently consider multiple occurrences of items?
6.
Assume nonnegative prices of items in a store, and find out the nature of constraint they represent in each of the following cases. Also suggest, how you will mine the association rules in these.
1. a.
  At least one Sudoku game.
2. b.
  Itemsets, whose sum of prices is less than $\$250$.
3. c.
  There is one free item, and other items, whose sum of prices is equal or more than $\$300$.
4. d.
  The average price of all the items is between $\$200$ and $\$500$.
7.
Given a decision tree, you have the following options:
1. a.
  Convert the decision tree into rules, and then prune the resulting rules,
2. b.
  Prune the decision tree and then convert the pruned tree into rules.
Critically analyze both the approaches, and discuss their merits and demerits.
8.
Find out the worst case time complexity of decision-tree algorithm. Assume that data set is D, each item has n number of attributes, and the number of training tuples are |D|. (Hint. Answer is $|D|.n. \log |D|$.)
9.
It is required to cluster a given set of data into three clusters, where (x, y) represent the location of the object, and the distance function is Euclidean distance. The points are $P_1(3, 12),$ $P_2(3, 8),$ $P_3(9, 5),$ $Q_1(4, 7),$ $Q_2(6, 4),$ $Q_3(7, 5),$ $R_1(2, 3),$ and $R_2(5, 8)$. Use the k-means algorithm to show the three cluster centers after the first round of execution, and after the final round of execution.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chowdhary, K.R. (2020). Data Mining. In: Fundamentals of Artificial Intelligence. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3972-7_17

Download citation

DOI: https://doi.org/10.1007/978-81-322-3972-7_17
Published: 05 April 2020
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-3970-3
Online ISBN: 978-81-322-3972-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Mining

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Exercises

Exercises

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation