Extracting Semantics from Unconstrained Navigation on Wikipedia
- 148 Downloads
Semantic relatedness between words has been successfully extracted from navigation on Wikipedia pages. However, the navigational data used in the corresponding works are sparse and expected to be biased since they have been collected in the context of games. In this paper, we raise this limitation and explore if semantic relatedness can also be extracted from unconstrained navigation. To this end, we first highlight structural differences between unconstrained navigation and game data. Then, we adapt a state of the art approach to extract semantic relatedness on Wikipedia paths. We apply this approach to transitions derived from two unconstrained navigation datasets as well as transitions from WikiGame and compare the results based on two common gold standards. We confirm expected structural differences when comparing unconstrained navigation with the paths collected by WikiGame. In line with this result, the mentioned state of the art approach for semantic extraction on navigation data does not yield good results for unconstrained navigation. Yet, we are able to derive a relatedness measure that performs well on both unconstrained navigation data as well as game data. Overall, we show that unconstrained navigation data on Wikipedia is suited for extracting semantics.
KeywordsUsage analysis Semantic web
This work is funded by the DFG through the PoSTS II project. We also want to thank Alex Clemesha for providing us with the game data from the WikiGame website.
- 1.Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res (JAIR) 49:1–47Google Scholar
- 2.Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: The concept revisited. In: Proc. of the 10th international conference on World Wide WebGoogle Scholar
- 3.Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proc. of the 20th international joint conference on Artifical intelligenceGoogle Scholar
- 4.Meiss M, Menczer F, Fortunato S, Flammini A, Vespignani A (2008) Ranking web sites with real user traffic. In: Proc. First ACM International Conference on Web Search and Data Mining (WSDM), pp 65–75Google Scholar
- 5.Milne D, Witten IH (2008) An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In: Proc. of the Conference on Artificial Intelligence, AAAI ’08Google Scholar
- 6.Singer P, Niebler T, Strohmaier M, Hotho A (2013) Computing semantic relatedness from human navigational paths: A case study on wikipedia. IJSWIS 9(4):41–70Google Scholar
- 7.Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedness using wikipedia. In: Proc of the 21st National Conference on Artificial Intelligence - Volume 2, AAAI Press, 2Google Scholar
- 8.West R, Leskovec J (2012) Human wayfinding in information networks. In: Proc. of the 21st WWW ConfGoogle Scholar
- 9.West R, Pineau J, Precup D (2009) Wikispeedia: An online game for inferring semantic distances between concepts. In: Proc. of the 21st International Joint Conference on Artificial Intelligence (IJCAI)Google Scholar
- 10.West R, Paranjape A, Leskovec J (2015) Mining missing hyperlinks from human navigation traces: a case study of wikipedia. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, pp. 1242–1252Google Scholar
- 11.Zhang Z, Gentile A, Ciravegna F (2012) Recent advances in methods of lexical semantic relatedness - a survey. Nat Lang Eng 1(1):1–69Google Scholar