PAKDD 2005: Advances in Knowledge Discovery and Data Mining pp 269-279 | Cite as
Automatic Occupation Coding with Combination of Machine Learning and Hand-Crafted Rules
Abstract
We apply a machine learning method to the occupation coding, which is a task to categorize the answers to open-ended questions regarding the respondent’s occupation. Specifically, we use Support Vector Machines (SVMs) and their combination with hand-crafted rules. Conducting the occupation coding manually is expensive and sometimes leads to inconsistent coding results when the coders are not experts of the occupation coding. For this reason, a rule-based automatic method has been developed and used. However, its categorization performance is not satisfiable. Therefore, we adopt SVMs, which show high performance in various fields, and compare it with the rule-based method. We also investigate effective combination methods of SVMs and the rule-based method. In our methods, the output of the rule-based method is used as features for SVMs. We empirically show that SVMs outperform the rule-based method in the occupation coding and that the combination of the two methods yields even better accuracy.
Keywords
Support Vector Machine Machine Learning Method Combination Method General Social Survey Occupation DataPreview
Unable to display preview. Download preview PDF.
References
- 1.The National Institute for Japanese Language Publications (ed.): Word List by Semantic Principles. Shuei Press (1964)Google Scholar
- 2.Giorgetti, D., Sebastiani, F.: Multiclass text categorization for automated survey coding. In: Proceedings of the 18th ACM Symposium on Applied Computing (SAC 2003), pp. 798–802 (2003)Google Scholar
- 3.1995SSM Survey Research Group, SSM Industry and Occupation Classification (the 1995 edition). 1995SSM Survey Research Group (1995)Google Scholar
- 4.1995SSM Survey Research Group, Codebook for 1995SSM Survey. 1995SSM Survey Research Group (1996)Google Scholar
- 5.Hara, J., Umino, M.: Social Surveys Seminar. University of Tokyo Press (1984)Google Scholar
- 6.Isozaki, H., Hirao, T.: Japanese zero pronoun resolution based on ranking rules and machine learning. In: Proceedings of the 8th Conference on Empirical Methods in Natural Language Processing (EMNLP 2003), pp. 184–191 (2003)Google Scholar
- 7.Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
- 8.Kressel, U.: Pairwise classification and support vector machines. In: Schölkopf, B., Burgesa, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods -Support Vector Learning, pp. 255–268. The MIT Press, Cambridge (1999)Google Scholar
- 9.Kudo, T., Matsumoto, Y.: Chunking with support vector machines. Journal of Natural language Processing 9(5), 3–22 (2002)Google Scholar
- 10.Park, S.-B., Zhang, B.-T.: Text chunking by combining hand-crafted rules and memory-based learning. In: Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics (ACL 2003), pp. 497–504 (2003)Google Scholar
- 11.Sebastiani, F.: Machine learning automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
- 12.Takahashi, K.: A supporting system for coding of the answers from an open-ended question: An automatic coding system for SSM occupation data by case frame. Sociological Theory and Methods 15(1), 149–164 (2000)Google Scholar
- 13.Takahashi, K.: Automatic coding system for open-ended answers: Occupation data coding in the health and stratification survey. Keiai University International Studies 8(1), 31–52 (2001)Google Scholar
- 14.Takahashi, K.: Applying automatic occupation/industry coding system. In: Proceedings of the 8th Annual Meeting of the Association for Natural Language Processing, pp. 491–494 (2002)Google Scholar
- 15.Takahashi, K.: Applying the automatic occupational/industrial coding system to JGSS 2000. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2000, pp. 171–184 (2000)Google Scholar
- 16.Takahashi, K.: Applying the automatic occupational/industrial coding system to JGSS-2001. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2001 [2], pp. 179–192 (2003)Google Scholar
- 17.Takahashi, K.: A combination of ROCCO-system and support vector machines in occupation coding. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2002 [3], pp. 163–174 (2004)Google Scholar
- 18.Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)MATHGoogle Scholar
- 19.Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)CrossRefGoogle Scholar
- 20.Mainichi: CD Mainichi Shinbun 2000. Nichigai Associates Co. (2001)Google Scholar