Automatic Occupation Coding with Combination of Machine Learning and Hand-Crafted Rules

  • Kazuko Takahashi
  • Hiroya Takamura
  • Manabu Okumura
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3518)

Abstract

We apply a machine learning method to the occupation coding, which is a task to categorize the answers to open-ended questions regarding the respondent’s occupation. Specifically, we use Support Vector Machines (SVMs) and their combination with hand-crafted rules. Conducting the occupation coding manually is expensive and sometimes leads to inconsistent coding results when the coders are not experts of the occupation coding. For this reason, a rule-based automatic method has been developed and used. However, its categorization performance is not satisfiable. Therefore, we adopt SVMs, which show high performance in various fields, and compare it with the rule-based method. We also investigate effective combination methods of SVMs and the rule-based method. In our methods, the output of the rule-based method is used as features for SVMs. We empirically show that SVMs outperform the rule-based method in the occupation coding and that the combination of the two methods yields even better accuracy.

Keywords

Support Vector Machine Machine Learning Method Combination Method General Social Survey Occupation Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    The National Institute for Japanese Language Publications (ed.): Word List by Semantic Principles. Shuei Press (1964)Google Scholar
  2. 2.
    Giorgetti, D., Sebastiani, F.: Multiclass text categorization for automated survey coding. In: Proceedings of the 18th ACM Symposium on Applied Computing (SAC 2003), pp. 798–802 (2003)Google Scholar
  3. 3.
    1995SSM Survey Research Group, SSM Industry and Occupation Classification (the 1995 edition). 1995SSM Survey Research Group (1995)Google Scholar
  4. 4.
    1995SSM Survey Research Group, Codebook for 1995SSM Survey. 1995SSM Survey Research Group (1996)Google Scholar
  5. 5.
    Hara, J., Umino, M.: Social Surveys Seminar. University of Tokyo Press (1984)Google Scholar
  6. 6.
    Isozaki, H., Hirao, T.: Japanese zero pronoun resolution based on ranking rules and machine learning. In: Proceedings of the 8th Conference on Empirical Methods in Natural Language Processing (EMNLP 2003), pp. 184–191 (2003)Google Scholar
  7. 7.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Kressel, U.: Pairwise classification and support vector machines. In: Schölkopf, B., Burgesa, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods -Support Vector Learning, pp. 255–268. The MIT Press, Cambridge (1999)Google Scholar
  9. 9.
    Kudo, T., Matsumoto, Y.: Chunking with support vector machines. Journal of Natural language Processing 9(5), 3–22 (2002)Google Scholar
  10. 10.
    Park, S.-B., Zhang, B.-T.: Text chunking by combining hand-crafted rules and memory-based learning. In: Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics (ACL 2003), pp. 497–504 (2003)Google Scholar
  11. 11.
    Sebastiani, F.: Machine learning automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  12. 12.
    Takahashi, K.: A supporting system for coding of the answers from an open-ended question: An automatic coding system for SSM occupation data by case frame. Sociological Theory and Methods 15(1), 149–164 (2000)Google Scholar
  13. 13.
    Takahashi, K.: Automatic coding system for open-ended answers: Occupation data coding in the health and stratification survey. Keiai University International Studies 8(1), 31–52 (2001)Google Scholar
  14. 14.
    Takahashi, K.: Applying automatic occupation/industry coding system. In: Proceedings of the 8th Annual Meeting of the Association for Natural Language Processing, pp. 491–494 (2002)Google Scholar
  15. 15.
    Takahashi, K.: Applying the automatic occupational/industrial coding system to JGSS 2000. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2000, pp. 171–184 (2000)Google Scholar
  16. 16.
    Takahashi, K.: Applying the automatic occupational/industrial coding system to JGSS-2001. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2001 [2], pp. 179–192 (2003)Google Scholar
  17. 17.
    Takahashi, K.: A combination of ROCCO-system and support vector machines in occupation coding. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2002 [3], pp. 163–174 (2004)Google Scholar
  18. 18.
    Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)MATHGoogle Scholar
  19. 19.
    Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)CrossRefGoogle Scholar
  20. 20.
    Mainichi: CD Mainichi Shinbun 2000. Nichigai Associates Co. (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Kazuko Takahashi
    • 1
  • Hiroya Takamura
    • 2
  • Manabu Okumura
    • 2
  1. 1.Faculty of International StudiesKeiai UniversitySakuraJapan
  2. 2.Precision and Intelligence LaboratoryTokyo Institute of TechnologyYokohamaJapan

Personalised recommendations