Research on Wine Analysis Based on Data Preprocessing
In the times of data increasing explosively, data preprocessing technology is particularly important for extracting information from massive data. In this paper, data preprocessing technology was implemented by building models including missing data imputation, duplicate values removal, outlier detections, data standardization and data statute based on the wine data in the UCI data set. Then the preprocessed data was compared with raw data with K-means algorithm, linear regression model and decision tree classification algorithm. The experimental results showed that after data preprocessing, the clustering error was significantly reduced, the fitness of the linear regression model increased and the classification accuracy of decision tree was higher, which showed the importance of data preprocessing and may have some referenced value to optimize data processing.
KeywordsData preprocessing Missing data imputation Duplicate values removal Outlier detection Data standardization Data statute
This paper is partially supported by The National Natural Science Foundation of China (No. 61563044, 61866031); National Natural Science Foundation of Qinghai Province (No. 2017-ZJ-902); The Applied Basic Research Programs of Science and Technology Department of Sichuan Province (No. 2019YJ0110); Youth Foundation of Qinghai University (No. 2017-QGY-4, 2018-QGY-7); Teaching Research Project of Qinghai University(KC18038, SZ18015, JY201805); Open Research Fund Program of State key Laboratory of Hydroscience and Engineering (No. sklhse-2017-A-05).
- 1.Zhou, Q.: Analysis of common data preprocessing techniques. World Commun. 26(01), 17–18 (2019)Google Scholar
- 3.Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall PTR, Upper Saddle River (2002)Google Scholar
- 4.Jian, Z., Jin, X.: Research on data preprocess in data mining and its application. Appl. Res. Comput. 7,117–118+157 (2004)Google Scholar
- 5.Sreenivas, P., Srikrishna, C.V.: An analytical approach for data preprocessing. In: 2013 International Conference on Emerging Trends in Communication, Control, Signal Processing and Computing Applications (C2SPCA), Bangalore, pp. 1–12 (2013)Google Scholar
- 6.Sun, B.: Research on data-preprocessing for construction of university information systems. In: 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), Taiyuan, pp. V1-459–V1-462 (2010)Google Scholar
- 7.Liu, K.: Clinical data preprocessing and case studies of POMDP for TCM treatment knowledge discovery. In: IEEE International Conference on E-Health Networking. IEEE (2012)Google Scholar
- 8.Kumar, M., Kalia, A.: Preprocessing and symbolic representation of stock data. In: Second International Conference on Advanced Computing & Communication Technologies. IEEE (2012)Google Scholar
- 10.Laurikkala, J., Juhola, M., Kentala, E.: Informal identification of outliers in medical data. In: Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, Berlin (2000)Google Scholar
- 12.Liu, J., Zhang, K., Wang, G.: Comparative study on data standardization methods in comprehensive evaluation. Digit. Technol. Appl. 36(06), 84–85 (2018)Google Scholar
- 17.Gao, H.: Experimental research on decision tree J48 algorithm based on weka platform. J. Hunan Inst. Sci. Technol. (Nat. Sci. Ed.) 30(01), 21–25 (2017)Google Scholar