Semantic Domains in Text Categorization
In the previous chapter we have shown that DMs provide a very exible and cheap solution for the problem of modeling domain knowledge. In particular, DMs have been used to define a Domain Space, in which the similarity among terms and texts is estimated.
We will show how to exploit DMs inside a supervised machine learning framework, in order to provide “external” knowledge to supervised NLP systems, which can be profitably used for topic similarity estimation. In particular we exploit a Domain Kernel (defined in Sect. 3.7), a similarity function among terms and texts that can be used by any kernel-based learning algorithm, with the effect of avoiding the problems of lexical variability and ambiguity.
In this chapter we show the advantages of domain-based feature representation in supervised learning, approaching the problem of Text Categorization. In particular we will evaluate the Domain Kernel in two tasks: Text Categorization (see Sect. 4.1) and Intensional Learning (see Sect. 4.2).
KeywordsLabel Data Unlabeled Data Domain Space Semantic Domain Intensional Learn
Unable to display preview. Download preview PDF.