Abstract
A novel NLP task, automatic survey coding, is described, and two methods for performing this task are presented. The first method uses a Boolean pattern-matching strategy to code survey responses, while the second uses a vector-based (probabilistic) method. The performance of the two methods is tested and compared on three representative survey datasets. The Boolean method is shown to perform slightly better on average than the vector-based method. Linguistic factors affecting the difficulty of the coding task for each survey are discussed.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Berlin, B. (1978) ‘Ethnobiological classification.’ In E. Rosch and B. Lloyd (eds.) Cognition and Categorization, pp. 9–27. Hillsdale, New Jersey: Lawrence Erlbaum.
Bookstein, A., (1985) ‘Probability and fuzzy-set applications to information retrieval.’ In M. Williams (ed.), Annual Review of Information Science and Technology 20:117–151.
Cohen, J. (1960) ‘A coefficient of agreement for nominal scales.’ Education and Psychological Measurement 20:37–46.
Davis, J., and Smith, T. (1996) General Social Surveys, 1972–1996: Cumulative Codebook. Chicago: National Opinion Research Center.
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. (1990) ‘Indexing by latent semantic analysis.’ Journal of the American Society for Information Science 41(6).
Duda, R., and Hart, P. (1973) Pattern Classification and Scene Analysis. New York: John Wiley & Sons.
Ellis, D. (1990) New Horizons in Information Retrieval. London: Library Association.
Fellbaum, C. (1993) ‘English verbs as a semantic net.’ In G. Miller (ed.) Five Papers on Wordnet. http://www.cogsci.princeton.edu/~wn.
Landis, J., and Koch, G. (1977) ‘The measurement of observer agreement for categorical data.’ Biometrics 33:159–174.
Lewis, D. (1992) ‘An evaluation of phrasal and clustered representations on a text categorization task.’ ACM-SIGIR'92, pp. 37–50.
Pratt, D., and Mays, J. (1989) ‘Automatic coding of transcript data for a survey of recent college graduates.’ Proceedings of the Section on Survey Methods of the American Statistical Association Annual Meeting,pp. 796–801.
Raud, R., and Fallig, M. (1995) ‘Automating the coding process with neural networks.’ http://www.monmouth.com/~rraud/autocode.html.
Rosch, E. (1978) ‘Principles of categorization.’ In E. Rosch and B. Lloyd (eds.)Cognition and Categorization, pp. 28–49. Hillsdale, New Jersey: Lawrence Erlbaum.
Salton, G. (ed.) (1971) The SMART Retrieval System — Experiments in Automatic Document Processing. Englewood Cliffs, New Jersey: Prentice-Hall.
Salton, G., and McGill, M. (1983) Introduction to Modern Information Retrieval. New York: McGraw-Hill.
Schuetze, H., Hull, D., and Pedersen, P. (1995) ‘A comparison of classifiers and document representations for the routing problem.’ ACM-SIGIR'95, pp. 229–237.
Thomas, T. (1994) ‘Concept extraction applied to text analysis of medical records.’ Los Alamos Science 22:145–148.
Viechnicki, P. (1997) ‘A comparison of classification algorithms for a survey coding task.’ http://student-www.uchicago.edu/users/pdviechn/comp.html.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Viechnicki, P. (1998). A performance evaluation of automatic survey classifiers. In: Honavar, V., Slutzki, G. (eds) Grammatical Inference. ICGI 1998. Lecture Notes in Computer Science, vol 1433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054080
Download citation
DOI: https://doi.org/10.1007/BFb0054080
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64776-8
Online ISBN: 978-3-540-68707-8
eBook Packages: Springer Book Archive