Semantic relatedness between words has been successfully extracted from navigation on Wikipedia pages. However, the navigational data used in the corresponding works are sparse and expected to be biased since they have been collected in the context of games. In this paper, we raise this limitation and explore if semantic relatedness can also be extracted from unconstrained navigation. To this end, we first highlight structural differences between unconstrained navigation and game data. Then, we adapt a state of the art approach to extract semantic relatedness on Wikipedia paths. We apply this approach to transitions derived from two unconstrained navigation datasets as well as transitions from WikiGame and compare the results based on two common gold standards. We confirm expected structural differences when comparing unconstrained navigation with the paths collected by WikiGame. In line with this result, the mentioned state of the art approach for semantic extraction on navigation data does not yield good results for unconstrained navigation. Yet, we are able to derive a relatedness measure that performs well on both unconstrained navigation data as well as game data. Overall, we show that unconstrained navigation data on Wikipedia is suited for extracting semantics.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
The performance decrease on fewer evaluation pairs can be explained as follows: on one hand, with fewer data points to correlate, the correlation task becomes easier and one might thus expect that the correlation value rises. On the other hand though, data points with faulty ordering have a greater impact on the correlation score. If we remove “good” data points, i.e. with good correlated ordering, we are left with the “bad” data points. This way, it is possible to actually decrease correlation performance when leaving out data points.
The restriction of WikiLink to WikiStream source pages (restricted) should actually contain the same number of matchable evaluation pairs. We attribute this difference to the ever changing nature of Wikipedia.
Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res (JAIR) 49:1–47
Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: The concept revisited. In: Proc. of the 10th international conference on World Wide Web
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proc. of the 20th international joint conference on Artifical intelligence
Meiss M, Menczer F, Fortunato S, Flammini A, Vespignani A (2008) Ranking web sites with real user traffic. In: Proc. First ACM International Conference on Web Search and Data Mining (WSDM), pp 65–75
Milne D, Witten IH (2008) An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In: Proc. of the Conference on Artificial Intelligence, AAAI ’08
Singer P, Niebler T, Strohmaier M, Hotho A (2013) Computing semantic relatedness from human navigational paths: A case study on wikipedia. IJSWIS 9(4):41–70
Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedness using wikipedia. In: Proc of the 21st National Conference on Artificial Intelligence - Volume 2, AAAI Press, 2
West R, Leskovec J (2012) Human wayfinding in information networks. In: Proc. of the 21st WWW Conf
West R, Pineau J, Precup D (2009) Wikispeedia: An online game for inferring semantic distances between concepts. In: Proc. of the 21st International Joint Conference on Artificial Intelligence (IJCAI)
West R, Paranjape A, Leskovec J (2015) Mining missing hyperlinks from human navigation traces: a case study of wikipedia. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, pp. 1242–1252
Zhang Z, Gentile A, Ciravegna F (2012) Recent advances in methods of lexical semantic relatedness - a survey. Nat Lang Eng 1(1):1–69
This work is funded by the DFG through the PoSTS II project. We also want to thank Alex Clemesha for providing us with the game data from the WikiGame website.
About this article
Cite this article
Niebler, T., Schlör, D., Becker, M. et al. Extracting Semantics from Unconstrained Navigation on Wikipedia. Künstl Intell 30, 163–168 (2016). https://doi.org/10.1007/s13218-015-0417-5
- Usage analysis
- Semantic web