On the Use of Topic Models for Word Completion

  • Elisabeth Wolf
  • Shankar Vembu
  • Tristan Miller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)


We investigate the use of topic models, such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), for word completion tasks. The advantage of using these models for such an application is twofold. On the one hand, they allow us to exploit semantic or contextual information when predicting candidate words for completion. On the other hand, these probabilistic models have been found to outperform classical latent semantic analysis (LSA) for modeling text documents. We describe a word completion algorithm that takes into account the semantic context of the word being typed. We also present evaluation metrics to compare different models being used in our study. Our experiments validate our hypothesis of using probabilistic models for semantic analysis of text documents and their application in word completion tasks.


Singular Value Decomposition Topic Model Latent Dirichlet Allocation Text Document Latent Semantic Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Swiffin, A., Arnott, J., Pickering, J., Newell, A.: Adaptive and predictive techniques in a communication prosthesis. AAC: Augmentative and Alternative Communication 3, 181–191 (1987)CrossRefGoogle Scholar
  2. 2.
    Newell, A.F.: Effect of the PAL word prediction system on the quality and quantity of text generation. AAC: Augmentative and Alternative Communication 8, 304–311 (1992)CrossRefGoogle Scholar
  3. 3.
    Fazly, A., Hirst, G.: Testing the efficacy of part-of-speech information in word completion. In: Proceedings of the Workshop on Language Modeling for Text Entry Methods at the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary (2003)Google Scholar
  4. 4.
    Kozima, H., Ito, A.: A scene-based model of word prediction. In: Proceedings of the International Conference on New Methods in Language Processing (NeMLaP), Ankara, Turkey, pp. 110–120 (1996)Google Scholar
  5. 5.
    Li, J., Hirst, G.: Semantic knowledge in word completion. In: Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility (2005)Google Scholar
  6. 6.
    Miller, G.A.: Wordnet: An online lexical database. International Journal of Lexicography 3, 235–244 (1990)CrossRefGoogle Scholar
  7. 7.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  8. 8.
    Wolf, E.: A semantic-based word completion utility using latent semantic analysis. Diplom-Informatik thesis, Department of Technical Sciences, University of Applied Sciences, Oldenburg/Ostfriesland/Wilhelmshaven, Emden (2005)Google Scholar
  9. 9.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42, 177–196 (2001)zbMATHCrossRefGoogle Scholar
  10. 10.
    Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHCrossRefGoogle Scholar
  11. 11.
    Blei, D., Jordan, M.: Modeling annotated data. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada (2003)Google Scholar
  12. 12.
    Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.: Discovering object categories in image collections. In: Proceedings of the 10th IEEE International Conference on Computer Vision, Beijing, China (2005)Google Scholar
  13. 13.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 34, 1–38 (1977)MathSciNetGoogle Scholar
  14. 14.
    Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical methods. Machine Learning 37, 183–233 (1999)zbMATHCrossRefGoogle Scholar
  15. 15.
    Brand, M.: Incremental singular value decomposition of uncertain data with missing values. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 707–720. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Lewis, D.D.: Reuters-21578 Text Categorization Test Collection Distribution 1.0 README File v1.3 (2004)Google Scholar
  17. 17.
    Brand, M.: Fast online SVD revisions for lightweight recommender systems. In: Proceedings of the SIAM International Conference on Data Mining, San Francisco, CA, USA (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Elisabeth Wolf
    • 1
  • Shankar Vembu
    • 1
  • Tristan Miller
    • 2
  1. 1.German Research Center for Artificial IntelligenceKaiserslauternGermany
  2. 2.The Socialist Party of Great BritainLondonUnited Kingdom

Personalised recommendations