Fuzzy Clustering with Prototype Extraction for Census Data Analysis
- 2 Citations
- 813 Downloads
Abstract
Not long ago primary census data became available to publicity. It opened qualitatively new perspectives not only for researchers in demography and sociology, but also for those people, who somehow face processes occurring in society.
In this paper authors propose using Data Mining methods for searching hidden interconnections in census data. A novel clustering-based technique is described as well. It allows determining factors which influence people behavior, in particular decision-making process (as an example, a decision whether to have a baby or not). Proposed technique concerns contrast mining as it is based on dividing the whole set of respondents on two contrasting groups. The first group consists of those, who possess a certain feature (for instance, has a baby) unlike members of the second group. We propose define clustering based subgroups out of the first group and their prototypes out of the second one. By means of analyzing subgroups’ and their prototypes’ characteristics it is possible to identify which factors influence the decision-making process. Authors also provide an experimental example of the described approach usage, which additionally shows that fuzzy clustering provides more accurate results than hard clustering techniques.
Keywords
Fuzzy Cluster Data Mining Technique Subtractive Cluster Subtractive Algorithm Membership MatrixPreview
Unable to display preview. Download preview PDF.
References
- 1.Gantz, J., Reinsel, D.: The 2011 Digital Universe study: extracting value from chaos. IDC iview (June 2011), http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm
- 2.Mullins, I., et al.: Data Mining and clinical data repositories: insights from a 667,000 patient data set. Comput. in Biology and Medicine 36(12), 1351–1377 (2006)CrossRefGoogle Scholar
- 3.Public Law 104-191, 104th Congress. Health Insurance Portability and Accountability Act of 1996 (HIPAA) (August 21, 1996), http://aspe.hhs.gov/admnsimp/pl104191.htm
- 4.Patient Safety and Quality Improvement Act of 2005 (PSQIA). Federal Register 73(266) (2001) Google Scholar
- 5.Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002. Official J. of the European Communities L 201 (July 31, 2002) Google Scholar
- 6.The Law of Ukraine On State Statistics as for 2009, March 5 (in Ukrainian), http://zakon1.rada.gov.ua/cgi-bin/laws/main.cgi?nreg=2614-12&p=1265575855780241
- 7.Minnesota Population Center, University of Minnesota. Integrated Public Use Microdata Series International, https://international.ipums.org/international/
- 8.Lenz, H.-J., Shoshani, A.: Summarizability in OLAP and statistical data bases. In: Proc. 9th Int. Conf. Scientific and Statistical Database Manage, SSDBM 1997, Olympia, WA, USA (1997)Google Scholar
- 9.U.S. Census Bureau. Statistical Quality Standard E1: Analyzing Data, http://www.census.gov/quality/standards/standarde1.html
- 10.Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2006)zbMATHGoogle Scholar
- 11.Berson, A., Smith, S., Thearling, K.: An Overview of Data Mining Techniques (2005), http://www.stat.ucla.edu/~hqxu/stat19/DM-Techniques.pdf
- 12.Colet, E.: Clustering and Classification: Data Mining Approaches (July 4, 2004), http://www.taborcommunications.com/dsstar/00/0704/101861.html
- 13.Hammouda, K., Karay, F.: A comparative study of data clustering techniques, http://pami.uwaterloo.ca/pub/hammouda/sde625-paper.pdf
- 14.Dong, G.: International Workshop on Contrast Data Mining and Applications (2011), http://www.cs.wright.edu/~gdong/ContrastDMWorkshop.pdf
- 15.Novak, P.K., Lavrac, N., Webb, G.I.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. J. of Mach. Learning Research 10, 377–403 (2009)zbMATHGoogle Scholar
- 16.Dong, G., Bailey, J.: Overview of Contrast Data Mining as a Field and Preview of an Upcoming Book. In: 2011 IEEE 11th Int. Conf. on Data Mining Workshops, Vancouver, Canada (2011)Google Scholar
- 17.Bay, S.D., Pazzani, M.J.: Detecting change in categorical data: mining contrast sets. In: Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KKD 1999, San Diego, CA, USA (1999)Google Scholar
- 18.Dong, G., Li, J.: Efficient Mining of Emerging Patterns: Discovering Trends and Differences. In: Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KKD 1999, San Diego, CA, USA (1999)Google Scholar
- 19.Liu, B., Hsu, W., Ma, Y.: Discovering the set of fundamental rule changes. In: Proc. of the 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, USA, pp. 335–340 (2001)Google Scholar
- 20.Garriga, G.C., Kralj, P., Lavrač, N.: Closed Sets for Labeled Data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 163–174. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 21.Daly, O., Taniar, D.: Exception rules in data mining. In: Encyclopedia of Information Science and Technology (II), pp. 1144–1148. Idea Group Reference (2005)Google Scholar
- 22.Loekito, E., Bailey, J.: Fast Mining of High Dimensional Expressive Contrast Patterns Using Zero-suppressed Binary Decision Diagrams. In: Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2006), Philadelphia, PA, USA, pp. 307–316 (2006)Google Scholar
- 23.García-Borroto, M., Trinidad, J.F.M., Carrasco-Ochoa, J.A.: Fuzzy emerging patterns for classifying hard domains. Knowledge and Inform. Syst. 28(2), 473–489 (2011)CrossRefGoogle Scholar
- 24.Duan, L., et al.: Mining contrast inequalities in numeric dataset. In: Int. Conf. on Web-Age Inform. Manage. (WAIM), Jiuzhaigou, China, pp. 194–205 (2010)Google Scholar
- 25.Duan, L., Tang, C., Tang, L., Zhang, T., Zuo, J.: Mining Class Contrast Functions by Gene Expression Programming. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds.) ADMA 2009. LNCS, vol. 5678, pp. 116–127. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 26.Nedjar, S., Cicchetti, R., Lakhal, L.: Extracting semantics in OLAP databases using emerging cubes. Inform. Sci. 181, 2036–2059 (2011)CrossRefGoogle Scholar
- 27.Ramamohanarao, K., Bailey, J., Fan, H.: Efficient Mining of Contrast Patterns and Their Applications to Classification. In: Proc. of the 2005 3rd Int. Conf. on Intelligent Sensing and Inform. Process (ICISIP 2005), pp. 39–47. IEEE Computer Society, Washington, DC (2005)CrossRefGoogle Scholar
- 28.Fore, N., Dong, G.: CPC: A contrast pattern based clustering algorithm requiring no distance function. Department of Computer Science and Engineering, Wright State University, OH, USA, Tech. Rep. (2011)Google Scholar
- 29.Li, J., Wong, L.: Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics 18(5), 725–734 (2002)CrossRefGoogle Scholar
- 30.Dong, G., Fore, N.: Discovering dynamic logical blog communities based on their distinct interest profiles. In: The First Int. Conf. on Social Eco-Informatics (SOTICS 2011), Barcelona, Spain, pp. 24–30 (2011)Google Scholar
- 31.Kobyliński, Ł., Walczak, K.: Jumping Emerging Patterns with Occurrence Count in Image Classification. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 904–909. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 32.Encheva, S., Tumin, S.: Problem identification based on fuzzy functions. WSEAS Trans. on Advances in Eng. Educ. 6(9), 111–120 (2009)Google Scholar
- 33.Gupta, A., Kumar, N., Bhatnagar, V.: Analysis of medical data using Data Mining and formal concept analysis. Proc. World Academy of Sci., Ing. and Technology 6 (2005)Google Scholar
- 34.Nonyelum, O.: Potential value of Data Mining for customer relationship marketing in the banking industry. Adv. in Nat. Appl. Sci. 3(1), 73–78 (2009)Google Scholar
- 35.Ngai, E.W.T., Xiu, L., Chau, D.C.K.: Application of Data Mining techniques in customer relationship management: a literature review and classification. Expert Syst. with Applicat. 36, 2592–2602 (2009)CrossRefGoogle Scholar
- 36.Malerba, D., Esposito, F., Lisi, F., Appice, A.: Mining spatial association rules in census data. Research in Official Statistics 1, 19–45 (2002)Google Scholar
- 37.Malerba, D., Lisi, F., Appice, A., Sblendorio, F.: Mining census and geographic data in urban planning environments. In: Santini, L., Zotta, D. (eds.) Atti della Terza Conferenza Nazionale su Informatica e Pianificazione Urbana e Territoriale (INPUT 2003), Alinea Editrice, Firenze, Italy (2003)Google Scholar
- 38.Appice, A., Ceci, M., Lanza, A., Lisi, F., Malerba, D.: Discovery of spatial association rules in geo-referenced census data: a relational mining approach. Intelligent Data Analysis 7, 541–566 (2003)Google Scholar
- 39.Zaki, M., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proc. 2002 SIAM Int. Conf. Data Mining, Arlington, VA, USA, pp. 457–473 (2002)Google Scholar
- 40.Park, J., Chen, M., Yu, P.: An effective hash-based algorithm for mining association rules. In: Proc. 1995 ACM SIGMOD Int. Conf. Manage. Data, San Jose, CA, USA (1995)Google Scholar
- 41.Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. 1997 ACM SIGMOD Int. Conf. Manage. Data, Tucson, AZ, USA (1997)Google Scholar
- 42.Klosgen, W., May, M.: Census data mining – an application. In: Proc. 6th European Conf. Principles of Data Mining, Knowledge Discovery (PKDD 2002), pp. 65–79 (2002)Google Scholar
- 43.Chertov, O., Aleksandrova, M.: Clustering with prototype extraction for census data analysis. In: Proc. World Conf. Soft Computing WConSC 2011, San Francisco, CA, USA (2011), http://arxiv.org/abs/1106.5122
- 44.U.S. Census. 5-Percent Public Use Microdata Sample Files (2000), http://www.census.gov/Press-Release/www/2003/PUMS5.html
- 45.Priyono, A., et al.: Generation of fuzzy rules with subtractive clustering. J. Technology 43, 143–153 (2005)Google Scholar
- 46.Bezdek, J.C., Ehrlich, R., Full, W.: FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences 10(2-3), 191–203 (1984)CrossRefGoogle Scholar
- 47.Chiu, S.L.: Fuzzy model identification based on cluster estimation. J. Intelligent Fuzzy Syst. 2, 267–278 (1994)MathSciNetGoogle Scholar