Skip to main content

Discovering User Profiles from Semantically Indexed Scientific Papers

  • Conference paper
From Web to Social Web: Discovering and Deploying User and Content Profiles (WebMine 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4737))

Included in the following conference series:

Abstract

Typically, personalized information recommendation services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. We present an approach based on Word Sense Disambiguation (WSD) for the extraction of user profiles from documents. This approach relies on a knowledge-based WSD algorithm, called JIGSAW, for the semantic indexing of documents: JIGSAW exploits the WordNet lexical database to select, among all the possible meanings (senses) of a polysemous word, the correct one. Semantically indexed documents are used to train a naïve Bayes learner that infers “semantic”, sense-based user profiles as binary text classifiers (user-likes and user-dislikes).

Two empirical evaluations are described in the paper. In the first experimental session, JIGSAW has been evaluated according to the parameters of the Senseval-3 initiative, that provides a forum where the WSD systems are assessed against disambiguated datasets. The goal of the second empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended. Performance of classical keyword-based profiles has been compared to that of sense-based profiles in the task of recommending scientific papers. The results show that sense-based profiles outperform keyword-based ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using wordnet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Bloedhorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Proceedings of 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Mining for and from the Semantic Web Workshop, pp. 70–87. ACM Press, New York (2004)

    Google Scholar 

  3. Decadt, B., Hoste, V., Daelemans, W., Van den Bosch, A.: Gambl, genetic algorithm optimization of memory-based wsd. In: Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2002)

    Google Scholar 

  4. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  5. Degemmis, M., Lops, P., Semeraro, G.: A content-collaborative recommender that exploits wordnet-based user profiles for neighborhood formation. User Modeling and User-Adapted Interaction. The journal of Personalisation Resarch (UMUAI) 17(3), 217–255 (2007)

    Google Scholar 

  6. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  7. Guarino, N., Masolo, C., Vetere, G.: Content-based access to the web. IEEE Intelligent Systems 14(3), 70–80 (1999)

    Article  Google Scholar 

  8. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.), pp. 305–332. MIT Press, Cambridge (1998)

    Google Scholar 

  9. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 1986 SIGDOC Conference, pp. 20–29 (1986)

    Google Scholar 

  10. Magnini, B., Strapparava, C.: Improving user modelling with content-based techniques. In: Proc. 8th Int. Conf. User Modeling, pp. 74–83. Springer, Heidelberg (2001)

    Google Scholar 

  11. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. In: Text Categorization, ch. 16, pp. 575–608. The MIT Press, Cambridge (1999)

    Google Scholar 

  12. Mavroeidis, D., Tsatsaronis, G., Vazirgiannis, M., Theobald, M., Weikum, G.: Word sense disambiguation for exploiting hierarchical thesauri in text classification. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 181–192. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Proceedings of the AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41–48. AAAI Press, Stanford (1998)

    Google Scholar 

  14. Mihalcea, R.: Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. In: HLT/EMNLP. Proceedings of the Joint Conference on Human Language Technology / Empirical Methods in Natural Language Processing (2005)

    Google Scholar 

  15. Mihalcea, R., Chklovski, T.: Open Mind Word Expert: Creating Large Annotated Data Collections with Web Users’ Help. In: Proceedings of the EACL Workshop on Linguistically Annotated Corpora, Budapest (2003)

    Google Scholar 

  16. Mihalcea, R., Csomai, A.: Senselearner: Word sense disambiguation for all words in unrestricted text. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (2005)

    Google Scholar 

  17. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to Wordnet: an on-line lexical database. International Journal of Lexicography (Special Issue) 3(4), 235–244 (1990)

    Article  Google Scholar 

  18. Mooney, R.J., Roy, L.: Content-based book recommending using learning for text categorization. In: Proceedings of the 5th ACM Conference on Digital Libraries, San Antonio, US, pp. 195–204. ACM Press, New York (2000)

    Chapter  Google Scholar 

  19. Navigli, R., Velardi, P.: Structural semantic interconnections: A knowledge-based approach to word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1075–1086 (2005)

    Article  Google Scholar 

  20. Resnik, P.: Disambiguating noun groupings with respect to WordNet senses. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 54–68. Association for Computational Linguistics (1995)

    Google Scholar 

  21. Scott, S., Matwin, S.: Text classification using wordnet hypernyms. In: COLING-ACL Workshop on usage of WordNet in NLP Systems, pp. 45–51 (1998)

    Google Scholar 

  22. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  23. Semeraro, G., Degemmis, M., Lops, P., Basile, P.: Combining Learning and Word Sense Disambiguation for Intelligent User Profiling. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007, pp. 2856–2861. Morgan Kaufmann, San Francisco (2007)

    Google Scholar 

  24. Witten, I.H., Bell, T.C.: The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory 37(4), 1085–1094 (1991)

    Article  Google Scholar 

  25. Yao, Y.Y.: Measuring retrieval effectiveness based on user preference of documents. Journal of the American Society for Information Science 46(2), 133–145 (1995)

    Article  Google Scholar 

  26. Yuret, D.: Some experiments with a naive bayes wsd system. In: Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bettina Berendt Andreas Hotho Dunja Mladenic Giovanni Semeraro

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Semeraro, G., Basile, P., de Gemmis, M., Lops, P. (2007). Discovering User Profiles from Semantically Indexed Scientific Papers. In: Berendt, B., Hotho, A., Mladenic, D., Semeraro, G. (eds) From Web to Social Web: Discovering and Deploying User and Content Profiles. WebMine 2006. Lecture Notes in Computer Science(), vol 4737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74951-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74951-6_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74950-9

  • Online ISBN: 978-3-540-74951-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics