We will present the information extraction algorithms for a semantic personalised tourist recommender system Sightsplanner. The main challenges: information is spread across various information sources, it is usually stored in proprietary formats and is available in different languages in varying degrees of accuracy. We will address the mentioned challenges and describe our realization and ideas how to deal with each of them: scraping and extracting keywords from different web portals with different languages, dealing with missing multilingual data and identifying the same objects from different sources.
Keywords
- recommender system
- information retrieval
- entity disambiguation