Skip to main content

A Web-Based Automated System for Industry and Occupation Coding

  • Conference paper
Web Information Systems Engineering - WISE 2008 (WISE 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5175))

Included in the following conference series:

Abstract

This paper describes our newly developed Automated Industry and Occupation Coding System (AIOCS). The main function of the system is to classify natural language responses of survey questionnaires into equivalent numeric codes according to the standard code book from the Korean National Statistics Office (KNSO). We implemented the system using a range of automated classification techniques, including hand-crafted rules, a maximum entropy model, and information retrieval techniques, to enhance the performance of automated industry/occupation coding task. The result is a Web-based AIOCS available for public services via the Web site of KNSO. Compared with the previous system developed in 2005, the new Web-based system decreases coding cost with a higher speed and shows significant performance enhancement in production rate and accuracy. Furthermore, it facilitates practical uses through an easy Web user interface.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, B., Creecy, R.H., et al.: On Error Control of Automated Industry and Occupation Coding. Journal of Official Statistics 9(5), 729–745 (1993)

    Google Scholar 

  2. Takahashi, K.: A Supporting System for Coding of the Answers from an Open-ended Question: An Automatic Coding System for SSM Occupation Data by Case Frame. Sociological Theory and Methods 15(1), 149–164 (2000)

    Google Scholar 

  3. Takahashi, K., Takamura, H., Okumura, M.: Automatic Occupation Coding with Combination of Machine Learning and Hand-Crafted Rules. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 269–279. Springer, Heidelberg (2005)

    Google Scholar 

  4. Lim, H.S., Lee, W.K.H., et al.: An Automatic Code Classification System by Using Memory-Based Learning and Information Retrieval Technique. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.-H. (eds.) AIRS 2005. LNCS, vol. 3689, pp. 577–582. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Kolodner, J.: Case-Based Reasoning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  6. Mitchell, T.: Decision Tree Learning. In: Mitchell, T. (ed.) Machine Learning, pp. 52–78. McGraw-Hill, New York (1997)

    Google Scholar 

  7. Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)

    MATH  Google Scholar 

  8. Ratnaparkhi: A Maximum Entropy Model for Part-of-speech Tagging. In: Proc. of the Empirical Methods in Natural Language Processing, pp. 133–142 (1996)

    Google Scholar 

  9. Ratnaparkhi: A Simple Introduction to Maximum Entropy Models for Natural Language Processing, Technical Report 97-08, Institute for Research in Cognitive Science, Univ. of Pennsylvania (1997)

    Google Scholar 

  10. Korean Standard Industry Classification, Korea National Statistics Office (2000)

    Google Scholar 

  11. Korean Standard Occupation Classification, Korea National Statistics Office (2000)

    Google Scholar 

  12. Vilares, M., Ribadas, F.J., Vilares, J.: Phrase Similarity through the Edit Distance. In: Proc. of Database and Expert Systems Applications 2004. LNCS, vol. 31080, pp. 306–317. Springer, Heidelberg (2004)

    Google Scholar 

  13. Melz, R., Ryu, P.-M., Choi, K.-S.: Compiling large language resources using lexical similarity metrics for domain taxonomy learning. In: 5th Int. Conf. on Language Resources and Evaluation (2006)

    Google Scholar 

  14. Baeza-Yates, R., Ribeiro, B.: Modern Information Retrieval. Addison-Wesley, Reading (1998)

    Google Scholar 

  15. An Indexing Engine, Apache Lucene, http://lucene.apache.org/

  16. Java package for training and using maximum entropy models, OpenNLP MaxEnt, http://maxent.sourceforge.net/

Download references

Author information

Authors and Affiliations

Authors

Editor information

James Bailey David Maier Klaus-Dieter Schewe Bernhard Thalheim Xiaoyang Sean Wang

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jung, Y., Yoo, J., Myaeng, SH., Han, DC. (2008). A Web-Based Automated System for Industry and Occupation Coding. In: Bailey, J., Maier, D., Schewe, KD., Thalheim, B., Wang, X.S. (eds) Web Information Systems Engineering - WISE 2008. WISE 2008. Lecture Notes in Computer Science, vol 5175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85481-4_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85481-4_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85480-7

  • Online ISBN: 978-3-540-85481-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics