Abstract
Part-of-speech (POS) tags have been employed in automatic genre classification in that they do not ‘reflect the topic of the document, but rather the type of text used in the document’ and that their distribution has been observed to vary across different genres. The current study introduces a new set of linguistically fine-grained POS tags generated by AUTASYS for automatic genre classification. The experiment was designed to investigate the impact of the proposed feature set when compared and contrasted with word unigrams as a bag of words (BOW) and an impoverished POS tag set. Machine-learning tools were used to evaluate the classification performance in terms of F-score. The British component of the International Corpus of English was employed as a resource of different text genres. Ten different genre classification tasks were identified based on the existing British component of the International Corpus of English (ICE-GB) categories, which are grouped according to different granularities. As our results will show, the use of linguistically rich POS tags as discriminative features produces superior accuracy when compared with BOW for fine-grained genre classification. Our results will further demonstrate that the superior performance is due to the rich linguistic information since an impoverished tag set yielded worse classification results.
This study was originally presented at the 24th Pacific Asia Conference on Language, Information and Computation, Sendai, Japan, 4–7 November 2010.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fang, A., Cao, J. (2015). Part-of-Speech Tags and ICE Text Classification. In: Text Genres and Registers: The Computation of Linguistic Features. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45100-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-45100-7_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45099-4
Online ISBN: 978-3-662-45100-7
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)