Fuzzy Clustering with Prototype Extraction for Census Data Analysis

Abstract

Not long ago primary census data became available to publicity. It opened qualitatively new perspectives not only for researchers in demography and sociology, but also for those people, who somehow face processes occurring in society.

In this paper authors propose using Data Mining methods for searching hidden interconnections in census data. A novel clustering-based technique is described as well. It allows determining factors which influence people behavior, in particular decision-making process (as an example, a decision whether to have a baby or not). Proposed technique concerns contrast mining as it is based on dividing the whole set of respondents on two contrasting groups. The first group consists of those, who possess a certain feature (for instance, has a baby) unlike members of the second group. We propose define clustering based subgroups out of the first group and their prototypes out of the second one. By means of analyzing subgroups’ and their prototypes’ characteristics it is possible to identify which factors influence the decision-making process. Authors also provide an experimental example of the described approach usage, which additionally shows that fuzzy clustering provides more accurate results than hard clustering techniques.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gantz, J., Reinsel, D.: The 2011 Digital Universe study: extracting value from chaos. IDC iview (June 2011), http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm
  2. 2.
    Mullins, I., et al.: Data Mining and clinical data repositories: insights from a 667,000 patient data set. Comput. in Biology and Medicine 36(12), 1351–1377 (2006)CrossRefGoogle Scholar
  3. 3.
    Public Law 104-191, 104th Congress. Health Insurance Portability and Accountability Act of 1996 (HIPAA) (August 21, 1996), http://aspe.hhs.gov/admnsimp/pl104191.htm
  4. 4.
    Patient Safety and Quality Improvement Act of 2005 (PSQIA). Federal Register  73(266) (2001) Google Scholar
  5. 5.
    Directive 2002/58/EC of the European Parliament and of the Council of 12 July 2002. Official J. of the European Communities  L 201 (July 31, 2002) Google Scholar
  6. 6.
    The Law of Ukraine On State Statistics as for 2009, March 5 (in Ukrainian), http://zakon1.rada.gov.ua/cgi-bin/laws/main.cgi?nreg=2614-12&p=1265575855780241
  7. 7.
    Minnesota Population Center, University of Minnesota. Integrated Public Use Microdata Series International, https://international.ipums.org/international/
  8. 8.
    Lenz, H.-J., Shoshani, A.: Summarizability in OLAP and statistical data bases. In: Proc. 9th Int. Conf. Scientific and Statistical Database Manage, SSDBM 1997, Olympia, WA, USA (1997)Google Scholar
  9. 9.
    U.S. Census Bureau. Statistical Quality Standard E1: Analyzing Data, http://www.census.gov/quality/standards/standarde1.html
  10. 10.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2006)MATHGoogle Scholar
  11. 11.
    Berson, A., Smith, S., Thearling, K.: An Overview of Data Mining Techniques (2005), http://www.stat.ucla.edu/~hqxu/stat19/DM-Techniques.pdf
  12. 12.
    Colet, E.: Clustering and Classification: Data Mining Approaches (July 4, 2004), http://www.taborcommunications.com/dsstar/00/0704/101861.html
  13. 13.
    Hammouda, K., Karay, F.: A comparative study of data clustering techniques, http://pami.uwaterloo.ca/pub/hammouda/sde625-paper.pdf
  14. 14.
    Dong, G.: International Workshop on Contrast Data Mining and Applications (2011), http://www.cs.wright.edu/~gdong/ContrastDMWorkshop.pdf
  15. 15.
    Novak, P.K., Lavrac, N., Webb, G.I.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. J. of Mach. Learning Research 10, 377–403 (2009)MATHGoogle Scholar
  16. 16.
    Dong, G., Bailey, J.: Overview of Contrast Data Mining as a Field and Preview of an Upcoming Book. In: 2011 IEEE 11th Int. Conf. on Data Mining Workshops, Vancouver, Canada (2011)Google Scholar
  17. 17.
    Bay, S.D., Pazzani, M.J.: Detecting change in categorical data: mining contrast sets. In: Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KKD 1999, San Diego, CA, USA (1999)Google Scholar
  18. 18.
    Dong, G., Li, J.: Efficient Mining of Emerging Patterns: Discovering Trends and Differences. In: Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KKD 1999, San Diego, CA, USA (1999)Google Scholar
  19. 19.
    Liu, B., Hsu, W., Ma, Y.: Discovering the set of fundamental rule changes. In: Proc. of the 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, USA, pp. 335–340 (2001)Google Scholar
  20. 20.
    Garriga, G.C., Kralj, P., Lavrač, N.: Closed Sets for Labeled Data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 163–174. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  21. 21.
    Daly, O., Taniar, D.: Exception rules in data mining. In: Encyclopedia of Information Science and Technology (II), pp. 1144–1148. Idea Group Reference (2005)Google Scholar
  22. 22.
    Loekito, E., Bailey, J.: Fast Mining of High Dimensional Expressive Contrast Patterns Using Zero-suppressed Binary Decision Diagrams. In: Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2006), Philadelphia, PA, USA, pp. 307–316 (2006)Google Scholar
  23. 23.
    García-Borroto, M., Trinidad, J.F.M., Carrasco-Ochoa, J.A.: Fuzzy emerging patterns for classifying hard domains. Knowledge and Inform. Syst. 28(2), 473–489 (2011)CrossRefGoogle Scholar
  24. 24.
    Duan, L., et al.: Mining contrast inequalities in numeric dataset. In: Int. Conf. on Web-Age Inform. Manage. (WAIM), Jiuzhaigou, China, pp. 194–205 (2010)Google Scholar
  25. 25.
    Duan, L., Tang, C., Tang, L., Zhang, T., Zuo, J.: Mining Class Contrast Functions by Gene Expression Programming. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds.) ADMA 2009. LNCS, vol. 5678, pp. 116–127. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  26. 26.
    Nedjar, S., Cicchetti, R., Lakhal, L.: Extracting semantics in OLAP databases using emerging cubes. Inform. Sci. 181, 2036–2059 (2011)CrossRefGoogle Scholar
  27. 27.
    Ramamohanarao, K., Bailey, J., Fan, H.: Efficient Mining of Contrast Patterns and Their Applications to Classification. In: Proc. of the 2005 3rd Int. Conf. on Intelligent Sensing and Inform. Process (ICISIP 2005), pp. 39–47. IEEE Computer Society, Washington, DC (2005)CrossRefGoogle Scholar
  28. 28.
    Fore, N., Dong, G.: CPC: A contrast pattern based clustering algorithm requiring no distance function. Department of Computer Science and Engineering, Wright State University, OH, USA, Tech. Rep. (2011)Google Scholar
  29. 29.
    Li, J., Wong, L.: Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics 18(5), 725–734 (2002)CrossRefGoogle Scholar
  30. 30.
    Dong, G., Fore, N.: Discovering dynamic logical blog communities based on their distinct interest profiles. In: The First Int. Conf. on Social Eco-Informatics (SOTICS 2011), Barcelona, Spain, pp. 24–30 (2011)Google Scholar
  31. 31.
    Kobyliński, Ł., Walczak, K.: Jumping Emerging Patterns with Occurrence Count in Image Classification. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 904–909. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  32. 32.
    Encheva, S., Tumin, S.: Problem identification based on fuzzy functions. WSEAS Trans. on Advances in Eng. Educ. 6(9), 111–120 (2009)Google Scholar
  33. 33.
    Gupta, A., Kumar, N., Bhatnagar, V.: Analysis of medical data using Data Mining and formal concept analysis. Proc. World Academy of Sci., Ing. and Technology 6 (2005)Google Scholar
  34. 34.
    Nonyelum, O.: Potential value of Data Mining for customer relationship marketing in the banking industry. Adv. in Nat. Appl. Sci. 3(1), 73–78 (2009)Google Scholar
  35. 35.
    Ngai, E.W.T., Xiu, L., Chau, D.C.K.: Application of Data Mining techniques in customer relationship management: a literature review and classification. Expert Syst. with Applicat. 36, 2592–2602 (2009)CrossRefGoogle Scholar
  36. 36.
    Malerba, D., Esposito, F., Lisi, F., Appice, A.: Mining spatial association rules in census data. Research in Official Statistics 1, 19–45 (2002)Google Scholar
  37. 37.
    Malerba, D., Lisi, F., Appice, A., Sblendorio, F.: Mining census and geographic data in urban planning environments. In: Santini, L., Zotta, D. (eds.) Atti della Terza Conferenza Nazionale su Informatica e Pianificazione Urbana e Territoriale (INPUT 2003), Alinea Editrice, Firenze, Italy (2003)Google Scholar
  38. 38.
    Appice, A., Ceci, M., Lanza, A., Lisi, F., Malerba, D.: Discovery of spatial association rules in geo-referenced census data: a relational mining approach. Intelligent Data Analysis 7, 541–566 (2003)Google Scholar
  39. 39.
    Zaki, M., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proc. 2002 SIAM Int. Conf. Data Mining, Arlington, VA, USA, pp. 457–473 (2002)Google Scholar
  40. 40.
    Park, J., Chen, M., Yu, P.: An effective hash-based algorithm for mining association rules. In: Proc. 1995 ACM SIGMOD Int. Conf. Manage. Data, San Jose, CA, USA (1995)Google Scholar
  41. 41.
    Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. 1997 ACM SIGMOD Int. Conf. Manage. Data, Tucson, AZ, USA (1997)Google Scholar
  42. 42.
    Klosgen, W., May, M.: Census data mining – an application. In: Proc. 6th European Conf. Principles of Data Mining, Knowledge Discovery (PKDD 2002), pp. 65–79 (2002)Google Scholar
  43. 43.
    Chertov, O., Aleksandrova, M.: Clustering with prototype extraction for census data analysis. In: Proc. World Conf. Soft Computing WConSC 2011, San Francisco, CA, USA (2011), http://arxiv.org/abs/1106.5122
  44. 44.
    U.S. Census. 5-Percent Public Use Microdata Sample Files (2000), http://www.census.gov/Press-Release/www/2003/PUMS5.html
  45. 45.
    Priyono, A., et al.: Generation of fuzzy rules with subtractive clustering. J. Technology 43, 143–153 (2005)Google Scholar
  46. 46.
    Bezdek, J.C., Ehrlich, R., Full, W.: FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences 10(2-3), 191–203 (1984)CrossRefGoogle Scholar
  47. 47.
    Chiu, S.L.: Fuzzy model identification based on cluster estimation. J. Intelligent Fuzzy Syst. 2, 267–278 (1994)MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Applied Mathematics DepartmentNational Technical University of Ukraine ”Kyiv Polytechnic Institute”KyivUkraine

Personalised recommendations