A Neural Network for Text Representation
Text categorization and retrieval tasks are often based on a good representation of textual data. Departing from the classical vector space model, several probabilistic models have been proposed recently, such as PLSA. In this paper, we propose the use of a neural network based, non-probabilistic, solution, which captures jointly a rich representation of words and documents. Experiments performed on two information retrieval tasks using the TDT2 database and the TREC-8 and 9 sets of queries yielded a better performance for the proposed neural network model, as compared to PLSA and the classical TFIDF representations.
KeywordsLatent Dirichlet Allocation Hide Unit Latent Semantic Analysis Vector Space Model Retrieval Task
Unable to display preview. Download preview PDF.
- 3.Le Cun, Y., Huang, F.J.: Loss functions for discriminative training of energy-based models. In: Proc. of AIStats (2005)Google Scholar
- 4.Salton, G., Wong, A., Yang, C.: A Vector Space Model for Automatic Indexing. Communication of the ACM 18 (1975)Google Scholar
- 10.Keller, M., Bengio, S.: Theme topic mixture model: A graphical model for document representation. In: PASCAL Workshop on Learning Methods for Text Understanding and Mining (2004)Google Scholar
- 13.Collobert, R., Bengio, S.: Links between perceptrons, MLPs and SVMs. In: Proceedings of ICML (2004)Google Scholar
- 14.Lewis, D.D.: The trec-4 filtering track. In: TREC (1995)Google Scholar