A comprehensive survey of data mining

Gupta, Manoj Kumar; Chandra, Pravin

doi:10.1007/s41870-020-00427-7

A comprehensive survey of data mining

Original Research
Published: 06 February 2020

Volume 12, pages 1243–1257, (2020)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

4403 Accesses
56 Citations
Explore all metrics

Abstract

Data mining plays an important role in various human activities because it extracts the unknown useful patterns (or knowledge). Due to its capabilities, data mining become an essential task in large number of application domains such as banking, retail, medical, insurance, bioinformatics, etc. To take a holistic view of the research trends in the area of data mining, a comprehensive survey is presented in this paper. This paper presents a systematic and comprehensive survey of various data mining tasks and techniques. Further, various real-life applications of data mining are presented in this paper. The challenges and issues in area of data mining research are also presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Fayadd U, Piatesky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AAAI Press/The MIT Press, Massachusetts Institute of Technology. ISBN 0–262 56097–6 Fayap
Fayadd U, Piatesky-Shapiro G, Smyth P (1996) Knowledge discovery and data mining: towards a unifying framework. In: Proceedings of the 2nd ACM international conference on knowledge discovery and data mining (KDD), Portland, pp 82–88
Heikki M (1996) Data mining: machine learning, statistics, and databases. In: SSDBM ’96: proceedings of the eighth international conference on scientific and statistical database management, June 1996, pp 2–9
Arora RK, Gupta MK (2017) e-Governance using data warehousing and data mining. Int J Comput Appl 169(8):28–31
Google Scholar
Morik K, Bhaduri K, Kargupta H (2011) Introduction to data mining for sustainability. Data Min Knowl Discov 24(2):311–324
Google Scholar
Han J, Kamber M, Pei J (2012) Data mining concepts and techniques, 3rd edn. Elsevier, Netherlands
MATH Google Scholar
Friedman JH (1997) Data mining and statistics: What is the connection? in: Keynote Speech of the 29th Symposium on the Interface: Computing Science and Statistics, Houston, TX, 1997
Turban E, Aronson JE, Liang TP, Sharda R (2007) Decision support and business intelligence systems. 8^th edn, Pearson Education, UK
Gheware SD, Kejkar AS, Tondare SM (2014) Data mining: tasks, tools, techniques and applications. Int J Adv Res Comput Commun Eng 3(10):8095–8098
Google Scholar
Kiranmai B, Damodaram A (2014) A review on evaluation measures for data mining tasks. Int J Eng Comput Sci 3(7):7217–7220
Google Scholar
Sharma M (2014) Data mining: a literature survey. Int J Emerg Res Manag Technol 3(2):1–4
Google Scholar
Venkatadri M, Reddy LC (2011) A review on data mining from past to the future. Int J Comput Appl 15(7):19–22
Google Scholar
Chen M, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Google Scholar
Gupta MK, Chandra P (2019) A comparative study of clustering algorithms. In: Proceedings of the 13th INDIACom-2019; IEEE Conference ID: 461816; 6th International Conference on “Computing for Sustainable Global Development”
Ponniah P (2001) Data warehousing fundamentals. Wiley, USA
Google Scholar
Chandra P, Gupta MK (2018) Comprehensive survey on data warehousing research. Int J Inform Technol 10(2):217–224
Google Scholar
Weiss SH, Indurkhya N (1998) Predictive data mining: a practical guide. Morgan Kaufmann Publishers, San Francisco
MATH Google Scholar
Fu Y (1997) Data mining: tasks, techniques, and applications. IEEE Potentials 16(4):18–20
Google Scholar
Abuaiadah D (2015) Using bisect k-means clustering technique in the analysis of arabic documents. ACM Trans Asian Low-Resour Lang Inf Process 15(3):1–17
Google Scholar
Algergawy A, Mesiti M, Nayak R, Saake G (2011) XML data clustering: an overview. ACM Comput Surv 43(4):1–25
MATH Google Scholar
Angiulli F, Fassetti F (2013) Exploiting domain knowledge to detect outliers. Data Min Knowl Discov 28(2):519–568
MathSciNet MATH Google Scholar
Angiulli F, Fassetti F (2016) Toward generalizing the unification with statistical outliers: the gradient outlier factor measure. ACM Trans Knowl Discov Data 10(3):1–26
Google Scholar
Bhatnagar V, Ahuja S, Kaur S (2015) Discriminant analysis-based cluster ensemble. Int J Data Min Modell Manag 7(2):83–107
Google Scholar
Bouguessa M (2013) Clustering categorical data in projected spaces. Data Min Knowl Discov 29(1):3–38
MathSciNet Google Scholar
Campello RJGB, Moulavi D, Zimek A, Sander J (2015) Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans Knowl Discov Data 10(1):1–51
Google Scholar
Carpineto C, Osinski S, Romano G, Weiss D (2009) A survey of web clustering engines. ACM Comput. Surv. 41(3):1–38
Google Scholar
Ceglar A, Roddick JF (2006) Association mining. ACM Comput Surv 38(2):1–42
Google Scholar
Chen YL, Weng CH (2009) Mining fuzzy association rules from questionnaire data. Knowl Based Syst 22(1):46–56
Google Scholar
Fan Chin-Yuan, Fan Pei-Shu, Chan Te-Yi, Chang Shu-Hao (2012) Using hybrid data mining and machine learning clustering analysis to predict the turnover rate for technology professionals. Expert Syst Appl 39:8844–8851
Google Scholar
Das R, Kalita J, Bhattacharya (2011) A pattern matching approach for clustering gene expression data. Int J Data Min Model Manag 3(2):130–149
Google Scholar
Dincer E (2006) The k-means algorithm in data mining and an application in medicine. Kocaeli Univesity, Kocaeli
Google Scholar
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):1–32
Google Scholar
Gupta MK, Chandra P (2019) P-k-means: k-means using partition based cluster initialization method. In: Proceedings of the international conference on advancements in computing and management (ICACM 2019), Elsevier SSRN, pp 567–573
Gupta MK, Chandra P (2019) An empirical evaluation of k-means clustering algorithm using different distance/similarity metrics. In: Proceedings of the international conference on emerging trends in information technology (ICETIT-2019), emerging trends in information technology, LNEE 605 pp 884–892 DOI: https://doi.org/10.1007/978-3-030-30577-2_79
Hea Z, Xua X, Huangb JZ, Denga S (2004) Mining class outliers: concepts, algorithms and applications in CRM. Expert Syst Appl 27(4):681e97
Google Scholar
Hung LN, Thu TNT, Nguyen GC (2015) An efficient algorithm in mining frequent itemsets with weights over data stream using tree data structure. IJ Intell Syst Appl 12:23–31
Google Scholar
Hung LN, Thu TNT (2016) Mining frequent itemsets with weights over data stream using inverted matrix. IJ Inf Technol Comput Sci 10:63–71
Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput. Surv 31(3):1–60
Google Scholar
Jin H, Wang S, Zhou Q, Li Y (2014) An improved method for density-based clustering. Int J Data Min Model Manag 6(4):347–368
Google Scholar
Khandare A, Alvi AS (2017) Performance analysis of improved clustering algorithm on real and synthetic data. IJ Comput Netw Inf Secur 10:57–65
Google Scholar
Koh YS, Ravana SD (2016) Unsupervised rare pattern mining: a survey. ACM Trans Knowl Discov Data 10(4):1–29
Google Scholar
Kosina P, Gama J (2015) Very fast decision rules for classification in data streams. Data Min Knowl Discov 29(1):168–202
MathSciNet Google Scholar
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268
MathSciNet MATH Google Scholar
Kumar D, Bezdek JC, Rajasegarar S, Palaniswami M, Leckie C, Chan J, Gubbi J (2016) Adaptive cluster tendency visualization and anomaly detection for streaming data. ACM Trans Knowl Discov Data 11(2):1–24
Google Scholar
Lee G, Yun U (2017) A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives. Future Gener Comput Syst 68:89–110
Google Scholar
Li G, Zaki MJ (2015) Sampling frequent and minimal boolean patterns: theory and application in classification. Data Min Knowl Discov 30(1):181–225. https://doi.org/10.1007/s10618-015-0409-y
Article MathSciNet MATH Google Scholar
Liao TW, Triantaphyllou E (2007) Recent advances in data mining of enterprise data: algorithms and applications. World Scientific Publishing, Singapore, pp 111–145
Google Scholar
Mabroukeh NR, Ezeife CI (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43:1
Google Scholar
Mampaey M, Vreeken J (2011) Summarizing categorical data by clustering attributes. Data Min Knowl Discov 26(1):130–173
MathSciNet MATH Google Scholar
Menardi G, Torelli N (2012) Training and assessing classification rules with imbalanced data. Data Min Knowl Discov 28(1):4–28. https://doi.org/10.1007/s10618-012-0295-5
Article MathSciNet MATH Google Scholar
Mukhopadhyay A, Maulik U, Bandyopadhyay S (2015) A survey of multiobjective evolutionary clustering. ACM Comput Surv 47(4):1–46
Google Scholar
Pei Y, Fern XZ, Tjahja TV, Rosales R (2016) ‘Comparing clustering with pairwise and relative constraints: a unified framework. ACM Trans Knowl Discov Data 11:2
Google Scholar
Rafalak M, Deja M, Wierzbicki A, Nielek R, Kakol M (2016) Web content classification using distributions of subjective quality evaluations. ACM Trans Web 10:4
Google Scholar
Reddy D, Jana PK (2014) A new clustering algorithm based on Voronoi diagram. Int J Data Min Model Manag 6(1):49–64
Google Scholar
Rustogi S, Sharma M, Morwal S (2017) Improved Parallel Apriori Algorithm for Multi-cores. IJ Inf Technol Comput Sci 4:18–23
Google Scholar
Shah-Hosseini H (2013) Improving K-means clustering algorithm with the intelligent water drops (IWD) algorithm. Int J Data Min Model Manag 5(4):301–317
Google Scholar
Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho ACPLF, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv 46(1):1–31
MATH Google Scholar
Silva A, Antunes C (2014) Multi-relational pattern mining over data streams. Data Min Knowl Discov 29(6):1783–1814. https://doi.org/10.1007/s10618-014-0394-6
Article MathSciNet MATH Google Scholar
Sim K, Gopalkrishnan V, Zimek A, Cong G (2012) A survey on enhanced subspace clustering. Data Min Knowl Discov 26(2):332–397
MathSciNet MATH Google Scholar
Sohrabi MK, Roshani R (2017) Frequent itemset mining using cellular learning automata. Comput Hum Behav 68:244–253
Google Scholar
Craw Susan, Wiratunga Nirmalie, Rowe Ray C (2006) Learning adaptation knowledge to improve case-based reasoning. Artif Intell 170:1175–1192
MathSciNet MATH Google Scholar
Tan KC, Teoh EJ, Yua Q, Goh KC (2009) A hybrid evolutionary algorithm for attribute selection in data mining. Expert Syst Appl 36(4):8616–8630
Google Scholar
Tew C, Giraud-Carrier C, Tanner K, Burton S (2013) Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Min Knowl Discov 28(4):1004–1045
MathSciNet MATH Google Scholar
Wang L, Dong M (2015) Exemplar-based low-rank matrix decomposition for data clustering. Data Min Knowl Discov 29:324–357
MathSciNet MATH Google Scholar
Wang F, Sun J (2014) Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Discov 29:534–564
MathSciNet MATH Google Scholar
Wang B, Rahal I, Dong A (2011) Parallel hierarchical clustering using weighted confidence affinity. Int J Data Min Model Manag 3(2):110–129
Google Scholar
Zacharis NZ (2018) Classification and regression trees (CART) for predictive modeling in blended learning. IJ Intell Syst Appl 3:1–9
Google Scholar
Zhang W, Li R, Feng D, Chernikov A, Chrisochoides N, Osgood C, Ji S (2015) Evolutionary soft co-clustering: formulations, algorithms, and applications. Data Min Knowl Discov 29:765–791
MathSciNet MATH Google Scholar
Han J, Fu Y (1996) Exploration of the power of attribute-oriented induction in data mining. Adv Knowl Discov Data Min. AAAI/MIT Press, pp 399-421
Gupta A, Mumick IS (1995) Maintenance of materialized views: problems, techniques, and applications. IEEE Data Eng Bull 18(2):3
Google Scholar
Sawant V, Shah K (2013) A review of distributed data mining using agents. Int J Adv Technol Eng Res 3(5):27–33
Google Scholar
Gupta MK, Chandra P (2019) An efficient approach for selection of initial cluster centroids for k-means clustering algorithm. In: Proceedings international conference on recent developments in science engineering and technology (REDSET-2019), November 15–16 2019
Gupta MK, Chandra P (2019) MP-K-means: modified partition based cluster initialization method for k-means algorithm. Int J Recent Technol Eng 8(4):1140–1148
Google Scholar
Gupta MK, Chandra P (2019) HYBCIM: hypercube based cluster initialization method for k-means. IJ Innov Technol Explor Eng 8(10):3584–3587. https://doi.org/10.35940/ijitee.j9774.0881019
Article Google Scholar
Enke David, Thawornwong Suraphan (2005) The use of data mining and neural networks for forecasting stock market returns. Expert Syst Appl 29:927–940
Google Scholar
Mezyk Edward, Unold Olgierd (2011) Machine learning approach to model sport training. Comput Hum Behav 27:1499–1506
Google Scholar
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45(1):1–34
MATH Google Scholar
Hüllermeier Eyke (2005) Fuzzy methods in machine learning and data mining: status and prospects. Fuzzy Sets Syst 156:387–406
MathSciNet Google Scholar
Hullermeier Eyke (2011) Fuzzy sets in machine learning and data mining. Appl Soft Comput 11:1493–1505
Google Scholar
Gengshen Du, Ruhe Guenther (2014) Two machine-learning techniques for mining solutions of the ReleasePlanner™ decision support system. Inf Sci 259:474–489
Google Scholar
Smith Kate A, Gupta Jatinder ND (2000) Neural networks in business: techniques and applications for the operations researcher. Comput Oper Res 27:1023–1044
MATH Google Scholar
Huang Mu-Jung, Tsou Yee-Lin, Lee Show-Chin (2006) Integrating fuzzy data mining and fuzzy artificial neural networks for discovering implicit knowledge. Knowl Based Syst 19:396–403
Google Scholar
Padhraic S (2000) Data mining: analysis on grand scale. Stat Method Med Res 9(4):309–327. https://doi.org/10.1191/096228000701555181
Article MATH Google Scholar
Saeed S, Ali M (2012) Privacy-preserving back-propagation and extreme learning machine algorithms. Data Knowl Eng 79–80:40–61
Google Scholar
Singh Y, Bhatia PK, Sangwan OP (2007) A review of studies on machine learning techniques. Int J Comput Sci Secur 1(1):70–84
Google Scholar
Yahia ME, El-taher ME (2010) A new approach for evaluation of data mining techniques. Int J Comput Sci Issues 7(5):181–186
Google Scholar
Jackson J (2002) Data mining: a conceptual overview. Commun Assoc Inf Syst 8:267–296
Google Scholar
Heckerman D (1998) A tutorial on learning with Bayesian networks. Learning in graphical models. Springer, Netherlands, pp 301–354
MATH Google Scholar
Politano PM, Walton RO (2017) Statistics & research methodol. Lulu. com
Wetherill GB (1987) Regression analysis with application. Chapman & Hall Ltd, UK
Google Scholar
Anderberg MR (2014) Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks, vol 19. Academic Press, USA
Google Scholar
Mihoci A (2017) Modelling limit order book volume covariance structures. In: Hokimoto T (ed) Advances in statistical methodologies and their application to real problems. IntechOpen, Croatia. https://doi.org/10.5772/66152
Chapter Google Scholar
Thompson B (2004) Exploratory and confirmatory factor analysis: understanding concepts and applications. American Psychological Association, Washington, DC (ISBN:1-59147-093-5)
Google Scholar
Kuzey C, Uyar A, Delen (2014) The impact of multinationality on firm value: a comparative analysis of machine learning techniques. Decis Support Syst 59:127–142
Google Scholar
Chan Philip K, Salvatore JS (1997) On the accuracy of meta-learning for scalable data mining. J Intell Inf Syst 8:5–28
Google Scholar
Tsai Chih-Fong, Hsu Yu-Feng, Lin Chia-Ying, Lin Wei-Yang (2009) Intrusion detection by machine learning: a review. Expert Syst Appl 36:11994–12000
Google Scholar
Liao SH, Chu PH, Hsiao PY (2012) Data mining techniques and applications—a decade review from 2000 to 2011. Expert Syst Appl 39:11303–11311
Google Scholar
Kanevski M, Parkin R, Pozdnukhov A, Timonin V, Maignan M, Demyanov V, Canu S (2004) Environmental data mining and modelling based on machine learning algorithms and geostatistics. Environ Model Softw 19:845–855
Google Scholar
Jain N, Srivastava V (2013) Data mining techniques: a survey paper. Int J Res Eng Technol 2(11):116–119
Google Scholar
Baker RSJ (2010) Data mining for education. In: McGaw B, Peterson P, Baker E (eds) International encyclopedia of education, 3rd edn. Elsevier, Oxford, UK
Google Scholar
Lew A, Mauch H (2006) Introduction to data mining and its applications. Springer, Berlin
Google Scholar
Mukherjee S, Shaw R, Haldar N, Changdar S (2015) A survey of data mining applications and techniques. Int J Comput Sci Inf Technol 6(5):4663–4666
Google Scholar
Data mining examples: most common applications of data mining (2019). https://www.softwaretestinghelp.com/data-mining-examples/. Accessed 27 Dec 2019
Devi SVSG (2013) Applications and trends in data mining. Orient J Comput Sci Technol 6(4):413–419
Google Scholar
Data mining—applications & trends. https://www.tutorialspoint.com/data_mining/dm_applications_trends.htm
Keleş MK (2017) An overview: the impact of data mining applications on various sectors. Tech J 11(3):128–132
Google Scholar
Top 14 useful applications for data mining. https://bigdata-madesimple.com/14-useful-applications-of-data-mining/. Accessed 20 Aug 2014
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Making 5(4):597–604
Google Scholar
Padhy N, Mishra P, Panigrahi R (2012) A survey of data mining applications and future scope. Int J Comput Sci Eng Inf Technol 2(3):43–58
Google Scholar
Gibert K, Sanchez-Marre M, Codina V (2010) Choosing the right data mining technique: classification of methods and intelligent recommendation. In: International Congress on Environment Modelling and Software Modelling for Environment’s Sake, Fifth Biennial Meeting, Ottawa, Canada

Download references

Author information

Authors and Affiliations

University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, Sector-16C, Dwarka, Delhi, 110078, India
Manoj Kumar Gupta & Pravin Chandra

Authors

Manoj Kumar Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Pravin Chandra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manoj Kumar Gupta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, M.K., Chandra, P. A comprehensive survey of data mining. Int. j. inf. tecnol. 12, 1243–1257 (2020). https://doi.org/10.1007/s41870-020-00427-7

Download citation

Received: 29 June 2019
Accepted: 20 January 2020
Published: 06 February 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s41870-020-00427-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of data mining

Abstract

Access this article

Similar content being viewed by others

A Review of the Development and Future Trends of Data Mining Tools

A Survey on Big Data, Mining: (Tools, Techniques, Applications and Notable Uses)

Data Mining—A Tool for Handling Huge Voluminous Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comprehensive survey of data mining

Abstract

Access this article

Similar content being viewed by others

A Review of the Development and Future Trends of Data Mining Tools

A Survey on Big Data, Mining: (Tools, Techniques, Applications and Notable Uses)

Data Mining—A Tool for Handling Huge Voluminous Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation