Behavior Research Methods

, Volume 41, Issue 4, pp 1201–1209 | Cite as

Effect of tuned parameters on an LSA multiple choice questions answering model

  • Alain Lifchitz
  • Sandra Jhean-Larose
  • Guy Denhière
Article

Abstract

This article presents the current state of a work in progress, whose objective is to better understand the effects of factors that significantly influence the performance of latent semantic analysis (LSA). A difficult task, which consisted of answering (French) biology multiple choice questions, was used to test the semantic properties of the truncated singular space and to study the relative influence of the main parameters. A dedicated software was designed to fine-tune the LSA semantic space for the multiple choice questions task. With optimal parameters, the performances of our simple model were quite surprisingly equal or superior to those of seventh- and eighthgrade students. This indicates that semantic spaces were quite good despite their low dimensions and the small sizes of the training data sets. In addition, we present an original entropy global weighting of the answers’ terms for each of the multiple choice questions, which was necessary to achieve the model’s success.

References

  1. Baier, H., Lenhard, W., Hoffmann, J., & Schneider, W. (2008). SUMMA—An LSA integrated development system. Manuscript submitted for publication.Google Scholar
  2. Berry, M. W., & Browne, M. (2005). Understanding search engines: Mathematical modeling and text retrieval (2nd ed., pp. 34–38). Philadelphia: SIAM.CrossRefGoogle Scholar
  3. Deerwester, S., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41, 391–407.CrossRefGoogle Scholar
  4. Denhière, G., Hoareau, V., Jhean-Larose, S., Lehnard, W., Baïer, H., & Bellissens, C. (2007). Human hierarchization of semantic information in narratives and latent semantic analysis. In Proceed-ings of the 1st International Conference on Latent Semantic Analysis in Technology Enhanced Learning (LSA-TEL’07) (pp. 15-16). Heerlen.Google Scholar
  5. Denhière, G., & Lemaire, B. (2004). Representing children’s semantic knowledge from a multisource corpus. In Proceedings of the 14th Annual Meeting of the Society for Text and Discourse (p. 10). Mahwah, NJ: Erlbaum.Google Scholar
  6. Diaz, J. (2008). Diagnostic et modélisation de l’utilisateur: Prise en compte de l’incertain. Unpublished doctoral thesis, Université Pierre et Marie Curie, Paris.Google Scholar
  7. Diaz, J., Rifqi, M., Bouchon-Meunier, B., Jhean-Larose, S., & Denhière, G. (2008). Imperfect answers in multiple choice questionnaires. In P. Dillenbourg & M. Specht (Eds.), Proceedings of 3rd European Conference on Technology-Enhanced Learning (pp. 144–154). Berlin: Springer.Google Scholar
  8. Ding, C. H. Q. (1999). A similarity-based probability model for latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 58–65). New York: ACM Press.CrossRefGoogle Scholar
  9. Dumais, S. T. (1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, & Computers, 23, 229–236.CrossRefGoogle Scholar
  10. Dumais, S. T. (2007). LSA and information retrieval: Getting back to basics. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 293–321). Mahwah, NJ: Erlbaum.Google Scholar
  11. Efron, M. (2005). Eigenvalue-based model selection during latent semantic indexing. Journal of the American Society for Information Science & Technology, 56, 969–988.CrossRefGoogle Scholar
  12. Graesser, A. C., Wiemer-Hastings, K., Wiemer-Hastings, P., Kreuz, R., & Tutoring Research Group (1999). AutoTutor: A simulation of a human tutor. Cognitive Systems Research, 1, 35–51.CrossRefGoogle Scholar
  13. Harman, D. (1986). An experimental study of the factors important in document ranking. In Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 186-193). Pisa.Google Scholar
  14. Jhean-Larose, S., Leclercq, V., Diaz, J., Denhière, G., & Bouchon-Meunier, B. (2008). Knowledge evaluation based on LSA: MCQs and free answer questions. Manuscript submitted for publication.Google Scholar
  15. Kantrowitz, M., Mohit, B., & Mittal, V. O. (2000). Stemming and its effects on TFIDF ranking. In Proceedings of the 23rd Annual International ACM SIGIR’2000 Conference on Research and Development in Information Retrieval (pp. 357–359). Athens.Google Scholar
  16. Kintsch, W. (2007). Meaning in context. In T. K. Landauer, D. S. Mc-Namara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 89–105). Mahwah, NJ: Erlbaum.Google Scholar
  17. Martin, D. I., & Berry, M. W. (2007). Mathematical foundation behind latent semantic analysis. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 35–55). Mahwah, NJ: Erlbaum.Google Scholar
  18. NumPy [Online] (2009). [Matrix calculation library]. Available at http://numpy.scipy.org/.Google Scholar
  19. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14, 130–137.Google Scholar
  20. Porter, M. F. (2001). Snowball: French stemming algorithm [Online]. Available at http://snowball.tartarus.org/algorithms/french/ stemmer.html.Google Scholar
  21. Python Software Foundation [Online] (2009). Python [Programming language]. Available at www.python.org/about/.Google Scholar
  22. Quesada, J. (2007). Creating your own LSA spaces. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 71–85). Mahwah, NJ: Erlbaum.Google Scholar
  23. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24, 513–523.CrossRefGoogle Scholar
  24. Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18, 613–620.CrossRefGoogle Scholar
  25. Tisserand, D., Jhean-Larose, S., & Denhière, G. (2007). Eye movement analysis and latent semantic analysis on a comprehension and recall activity. In Proceedings of the 1st International Conference on Latent Semantic Analysis in Technology Enhanced Learning (LSATEL’ 07) (pp. 17–19). Heerlen.Google Scholar
  26. Wild, F. (2007). An LSA package for R. In Proceedings of the 1st International Conference on Latent Semantic Analysis in Technology Enhanced Learning (LSA-TEL’07) (pp. 11–12). Heerlen.Google Scholar
  27. Wild, F., Stahl, C., Stermsek, G., & Neumann, G. (2005). Parameters driving effectiveness of automated essay scoring with LSA. In Proceedings of the 9th CAA Conference. Loughborough, U.K. http:// magpie.lboro.ac.uk:8080/dspace-jspui/handle/2134/2008.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2009

Authors and Affiliations

  • Alain Lifchitz
    • 1
  • Sandra Jhean-Larose
    • 2
  • Guy Denhière
    • 2
  1. 1.LIP6-DAPAUniversité Pierre et Marie Curie, CNRSParisFrance
  2. 2.Équipe CHArtEPHE-CNRSParisFrance

Personalised recommendations