Advertisement

Information Retrieval

, Volume 17, Issue 4, pp 295–325 | Cite as

A survey of approaches for ranking on the web of data

  • Antonio J. Roa-Valverde
  • Miguel-Angel Sicilia
Article

Abstract

Ranking information resources is a task that usually happens within more complex workflows and that typically occurs in any form of information retrieval, being commonly implemented by Web search engines. By filtering and rating data, ranking strategies guide the navigation of users when exploring large volumes of information items. There exist a considerable number of ranking algorithms that follow different approaches focusing on different aspects of the complex nature of the problem, and reflecting the variety of strategies that are possible to apply. With the growth of the web of linked data, a new problem space for ranking algorithms has emerged, as the nature of the information items to be ranked is very different from the case of Web pages. As a consequence, existing ranking algorithms have been adapted to the case of Linked Data and some specific strategies have started to be proposed and implemented. Researchers and organizations deploying Linked Data solutions thus require an understanding of the applicability, characteristics and state of evaluation of ranking strategies and algorithms as applied to Linked Data. We present a classification that formalizes and contextualizes under a common terminology the problem of ranking Linked Data. In addition, an analysis and contrast of the similarities, differences and applicability of the different approaches is provided. We aim this work to be useful when comparing different approaches to ranking Linked Data and when implementing new algorithms.

Keywords

Linked data Information retrieval Semantic search Ranking algorithms Link analysis Semantic web data management 

References

  1. Dbpedia spotlight. (2011). Shedding light on the web of documents. In In the proceedings of the 7th international conference on semantic systems (I-Semantics) .Google Scholar
  2. Alani, H., Brewster, C., & Shadbolt, N. (2006). Ranking ontologies with aktiverank. In I. F. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, et al. (Eds.), International semantic web conference, lecture notes in computer science (Vol. 4273, pp. 1–15). Berlin: Springer.Google Scholar
  3. Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Void guide—using the vocabulary of interlinked datasets. http://rdfs.org/ns/void-guide.
  4. Anyanwu, K., Maduko, A., & Sheth, A. P. (2005). Semrank: Ranking complex relationship search results on the semantic web. In A. Ellis & T. Hagino (Eds.), WWW, pp. 117–127. ACM.Google Scholar
  5. Artiles, J., Sekine, S., & Gonzalo, J. (2008). Web people search: Results of the first evaluation and the plan for the second. In WWW, pp. 1071–1072.Google Scholar
  6. Baeza-Yates, R., & Davis, E. (2004). Web page ranking using link attributes. In: Proceedings of WWW-04and the 13th international World Wide Web conference—alternate track papers & posters, pp. 328–329. ACM Press.Google Scholar
  7. Balmin, A., Hristidis, V., & Papakonstantinou, Y. (2004). Objectrank: Authority-based keyword search in databases. In M. A. Nascimento, M. T. Özsu, D. Kossmann, R. J. Miller, J. A. Blakeley, K. B. Schiefer (Eds.), VLDB, pp. 564–575. Morgan Kaufmann.Google Scholar
  8. Balog, K., Carmel, D., de Vries, A. P., Herzig, D. M., Mika, P., Roitman, H., et al. (2012). The first joint international workshop on entity-oriented and semantic search (jiwes). SIGIR Forum, 46(2), 87–94.CrossRefGoogle Scholar
  9. Balog, K., & Neumayer, R. (2013). A test collection for entity search in dbpedia. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’13, pp. 737–740. ACM, New York, NY. doi: 10.1145/2484028.2484165.
  10. Balog, K., Serdyukov, P., & de Vries, A. P. (2010). Overview of the trec 2010 entity track. In TREC.Google Scholar
  11. Balog, K., Serdyukov, P., & de Vries, A. P. (2011). Overview of the trec 2011 entity track. In TREC.Google Scholar
  12. Balog, K., de Vries, A. P., Serdyukov, P., Thomas, P., & Westerveld, T. (2009). Overview of the trec 2009 entity track. In TREC.Google Scholar
  13. Berners-Lee, T. (2006). Linked data—design issues. http://www.w3.org/DesignIssues/LinkedData.html.
  14. Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., & Sudarshan, S. (2002). Keyword searching and browsing in databases using banks. In ICDE, pp. 431–440. IEEE Computer Society. http://dblp.uni-trier.de/rec/bibtex/conf/icde/BhalotiaHNCS02.
  15. Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data—the story so far. International Journal on Semantic Web and Information Systems, 5(3), 1–22.CrossRefGoogle Scholar
  16. Blanco, R., Mika, P., & Vigna, S. (2011). Effective and efficient entity search in rdf data. In Proceedings of the 10th international conference on The semantic web—volume part I, ISWC’11 (pp. 83–97). Berlin, Heidelberg: Springer. http://dl.acm.org/citation.cfm?id=2063016.2063023.
  17. Brickley, D., & Guha, R. (2014). Rdf vocabulary description language 1.1: Rdf schema—w3c recommendation. http://www.w3.org/TR/rdf-schema/.
  18. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.CrossRefGoogle Scholar
  19. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., et al. (2000). Graph structure in the web. Computer Networks, 33(1–6), 309–320.CrossRefGoogle Scholar
  20. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  21. Campinas, S., Delbru, R., & Tummarello, G. (2012). Effective retrieval model for entity with multi-valued attributes: Bm25mf and beyond. In EKAW, pp. 200–215.Google Scholar
  22. Chen, N., & Prasanna, V. K. (2012). Learning to rank complex semantic relationships. IJSWIS, 8(4), 1–19.zbMATHGoogle Scholar
  23. Cheng, G., & Qu, Y. (2009). Searching linked objects with falcons: approach, implementation and evaluation. International Journal on Semantic Web and Information System, 5(3), 49–70.CrossRefGoogle Scholar
  24. Coffman, J., & Weaver, A. C. (2010). A framework for evaluating database keyword search strategies. In J. Huang, N. Koudas, G. J. F. Jones, X. Wu, K. Collins-Thompson, A. An (Eds.), CIKM, pp. 729–738. ACM. http://dblp.uni-trier.de/db/conf/cikm/cikm2010.html#CoffmanW10
  25. Cyganiak, R., Harth, A., & Hogan, A. (2008). N-quads: Extending n-triples with context. http://sw.deri.org/2008/07/n-quads/.
  26. Dali, L., Fortuna, B., Tran, D. T., & Mladenic, D. (2012). Query-independent learning to rank for rdf entity search. In ESWC, pp. 484–498.Google Scholar
  27. Delbru, R., Toupikov, N., Catasta, M., Tummarello, G., & Decker, S. (2010). Hierarchical link analysis for ranking web data. In Proceedings of the 7th international conference on the semantic web: Research and applications—volume part II, ESWC’10 (pp. 225–239). Berlin, Heidelberg: Springer.Google Scholar
  28. Demartini, G., Iofciu, T., De Vries, A. P. (2010). Overview of the inex 2009 entity ranking track. In Proceedings of the focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval, INEX’09 (pp. 254–264). Berlin, Heidelberg: Springer. http://dl.acm.org/citation.cfm?id=1881065.1881096.
  29. Demartini, G., Vries, A. P., Iofciu, T., & Zhu, J. (2009). Advances in focused retrieval. chap. Overview of the INEX 2008 entity ranking track (pp. 243–252). Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642-03761-0_25.
  30. Fellbaum, C. (1998). A semantic network of english: the mother of all wordnets. Computers and the Humanities, 32(2–3), 209–220.CrossRefGoogle Scholar
  31. Fernandez, M., Lopez, V., Sabou, M., Uren, V., Vallet, D., Motta, E., et al. (2008). Semantic search meets the web. In Proceedings of the 2008 IEEE international conference on semantic computing, ICSC ’08 (pp. 253–260). IEEE Computer Society, Washington, DC, USA. doi: 10.1109/ICSC.2008.52.
  32. Finin, T., Peng, Y., Scott, R., Joel, C., Joshi, S. A., Reddivari, P., et al. (2004). Swoogle: A search and metadata engine for the semantic web. In Proceedings of the thirteenth ACM conference on information and knowledge management (pp. 652–659). ACM Press.Google Scholar
  33. Franz, T., Schultz, A., Sizov, S., & Staab, S. (2009). Triplerank: Ranking semantic web data by tensor decomposition. In International semantic web conference (ISWC).Google Scholar
  34. Franz, T., Schultz, A., Sizov, S., & Staab, S.(2009). Triplerank: Ranking semantic web data by tensor decomposition. In International semantic web conference (pp. 213–228).Google Scholar
  35. Getoor, L., & Diehl, C. P. (2005). Link mining: a survey. ACM SIGKDD Explorations Newsletter, 7(2), 3–12.CrossRefGoogle Scholar
  36. Halpin, H., Herzig, D. M., Mika, P., Blanco, R., Pound, J., Thompson, H. S., et al. (2010). Evaluating ad-hoc object retrieval. In Proceedings of the international workshop on evaluation of semantic technologies (IWEST 2010). 9th international semantic web conference (ISWC2010), Shanghai, PR China.Google Scholar
  37. Harth, A., Kinsella, S., & Decker, S. (2009). Using naming authority to rank data and ontologies for web search. In A. Bernstein, D. R. Karger, T. Heath, L. Feigenbaum, D. Maynard, E. Motta, et al. (Eds.), International semantic web conference, lecture notes in computer science (Vol. 5823, pp. 277–292). Berlin: Springer.Google Scholar
  38. He, H., Wang, H., Yang, J., & Yu, P. S. (2007). Blinks: Ranked keyword searches on graphs. In SIGMOD ’07. Proceedings of the 2007 ACM SIGMOD international conference on Management of data (pp. 305–316). New York, NY: ACM Press. doi: 10.1145/1247480.1247516.
  39. Hildebrand, M., van Ossenbruggen, J., & Hardman, L. (2007). An analysis of search-based user interaction on the semantic web. Nederlands, Centrum voor Wiskunde en Informatica: Tech. rep.Google Scholar
  40. Hoffart, J., Suchanek, F. M., Berberich, K., Lewis-Kelham, E., de Melo, G., Weikum, G. (2011). Yago2: Exploring and querying world knowledge in time, space, context, and many languages. In Proceedings of the 20th international conference companion on World wide web, WWW ’11 (pp. 229–232). New York, NY: ACM. doi: 10.1145/1963192.1963296.
  41. Hogan, A., Harth, A., & Decker, S. (2006). Reconrank: A scalable ranking method for semantic web data with context. In In 2nd workshop on scalable semantic web knowledge base systems.Google Scholar
  42. Hogan, A., Harth, A., Umrich, J., Kinsella, S., Polleres, A., & Decker, S. (2011). Searching and browsing linked data with swse: The semantic web search engine. Journal of Web Semantics, 9(4), 365–401.Google Scholar
  43. Hristidis, V., Gravano, L., & Papakonstantinou, Y. (2003). Efficient ir-style keyword search over relational databases. In VLDB, pp. 850–861. http://dblp.uni-trier.de/db/conf/vldb/vldb2003.html#HristidisGP03.
  44. Hristidis, V., & Papakonstantinou, Y. (2002). Discover: Keyword search in relational databases. In VLDB, pp. 670–681. Morgan Kaufmann.Google Scholar
  45. Jansen, B., & Spink, A. (2006). How are we searching the world wide web? A comparison of nine search engine transaction logs. Information Processing and Management, 42(1), 248–263.CrossRefGoogle Scholar
  46. Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., & Karambelkar, H. (2005). Bidirectional expansion for keyword search on graph databases. In K. Böhm, C. S. Jensen, L. M. Haas, M. L. Kersten, P. Å. Larson & B.C. Ooi (Eds.), VLDB, pp. 505–516. ACM. http://dblp.uni-trier.de/db/conf/vldb/vldb2005.html#KacholiaPCSDK05.
  47. Kamps, J., Geva, S., Trotman, A., Woodley, A., & Koolen, M. (2008). Overview of the inex 2008 ad hoc track. In INEX, pp. 1–28.Google Scholar
  48. Kasneci, G., Suchanek, F. M., Ifrim, G., Ramanath, M., Weikum, G. (2008). Naga: Searching and ranking knowledge. In: G. Alonso, J. A. Blakeley & A. L. P. Chen (Eds.), ICDE, pp. 953–962. IEEE. http://dblp.uni-trier.de/db/conf/icde/icde2008.html#KasneciSIRW08.
  49. Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. In Proceedings of the 9th annual ACM-SIAM symposium on discrete algorithms.Google Scholar
  50. Klyne, G., & Carroll, J. (2004). Resource description framework (rdf): Concepts and abstract syntax—w3c recommendation. http://www.w3.org/TR/2004/REC-rdf-concepts-20040210.
  51. Lassila, O. (2007). Programming semantic web applications: A synthesis of knowledge representation and semi-structured data. Ph.D. thesis, Helsinki University of Technology.Google Scholar
  52. Lei, Y., Uren, V. S., & Motta, E. (2006). A search engine for the semantic web. In EKAW, lecture notes in computer science (pp. 238–245). Semsearch: Springer.Google Scholar
  53. Lempel, R., & Moran, S. (2001). Salsa: the stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 19(2), 131–160.CrossRefGoogle Scholar
  54. Liu, F., Yu, C. T., Meng, W., Chowdhury, A. (2006). Effective keyword search in relational databases. In S. Chaudhuri, V. Hristidis & N. Polyzotis (Eds.), SIGMOD conference, pp. 563–574. ACM. http://dblp.uni-trier.de/db/conf/sigmod/sigmod2006.html#LiuYMC06.
  55. Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 225–331. doi: 10.1561/1500000016.CrossRefGoogle Scholar
  56. Liu, X., Croft, W. B. (2005). Statistical language modeling for information retrieval. ARIST, 39(1), 1–31. http://dblp.uni-trier.de/db/journals/arist/arist39.html#LiuC05.
  57. May, W. (1999). Information extraction and integration with florid: The mondial case study. Tech. Rep. 131, Universitaet Freiburg, Institut fuer Informatik.Google Scholar
  58. McGuinness, D., & van Harmelen, F. (2004). Owl web ontology language—w3c recommendation. http://www.w3.org/TR/owl-features/.
  59. Mirizzi, R., Ragone, A., Noia, T. D., & Sciascio, E. D. (2010). Ranking the linked data: The case of dbpedia. In B. Benatallah, F. Casati, G. Kappel, & G. Rossi (Eds.), ICWE, lecture notes in computer science (pp. 337–354). Berlin: Springer.Google Scholar
  60. Nie, Z., Zhang, Y., Wen, J. R., Ma, W. Y. (2005). Object-level ranking: Bringing order to web objects. In A. Ellis & T. Hagino (Eds.), WWW, pp. 567–574. ACM.Google Scholar
  61. Pérez-Agüera, J. R., Arroyo, J., Greenberg, J., Iglesias, J. P., & Fresno, V. (2010). Using bm25f for semantic search. In Proceedings of the 3rd international semantic search workshop, SEMSEARCH ’10 (pp. 2:1–2:8). New York, NY: ACM. doi: 10.1145/1863879.1863881.
  62. Pound, J., Mika, P., & Zaragoza, H. (2010). Ad-hoc object retrieval in the web of data. In Proceedings of the 19th international conference on World wide web, WWW ’10 (pp. 771–780). New York, NY: ACM.Google Scholar
  63. Roa-Valverde, A. J. (2011). Multimedia information retrieval as a practical application for interlinking approaches. In Proceedings of the 7th international conference on semantic systems, I-Semantics ’11 (pp. 230–233). New York, NY, USA: ACM.Google Scholar
  64. Sabou, M., Baldassarre, C., Gridinoc, L., Angeletou, S., Motta, E., d’Aquin, M., et al. (2007–06). Watson: A gateway for the semantic web. In ESWC 2007 poster session.Google Scholar
  65. Sawant, U., & Chakrabarti, S. (2013). Features and aggregators for web-scale entity search. CoRR abs/1303.3164.Google Scholar
  66. Schenkel, F. S. R., & Kasneci, G. (2007). Yawn: A semantically annotated wikipedia xml corpus. http://www.mpi-inf.mpg.de/%7Ekasneci/download/BTW2007.pdf.
  67. Sheth, A., Arpinar, I., & Kashyap, V. (2004). Relationships at the heart of semantic web: Modeling, discovering, and exploiting complex semantic relationships. In M. Nikravesh, B. Azvine, R. Yager & L. Zadeh (Eds.), Enhancing the power of the internet, studies in fuzziness and soft computing, vol. 139, pp. 63–94. Berlin, Heidelberg: Springer. doi: 10.1007/978-3-540-45218-8_4.
  68. Sicilia, M. Á., Rodríguez, D., Barriocanal, E. G., & Alonso, S. S. (2012). Empirical findings on ontology metrics. Expert Systems with Application, 39(8), 6706–6711.CrossRefGoogle Scholar
  69. Soboroff, I., de Vries, A.P., & Craswell, N. (2006). Overview of the trec 2006 enterprise track. In TREC.Google Scholar
  70. Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). Yago: A core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, WWW ’07, pp. 697–706. New York, NY, USA: ACM. doi: 10.1145/1242572.1242667.
  71. Tonon, A., Demartini, G., & Cudré-Mauroux, P. (2012). Combining inverted indices and structured search for ad-hoc object retrieval. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’12, pp. 125–134. New York, NY, USA: ACM. doi: 10.1145/2348283.2348304.
  72. Tummarello, G., Cyganiak, R., Catasta, M., Danielczyk, S., Delbru, R., & Decker, S. (2010). Sig.ma: Live views on the web of data. Journal of Web Semantics, 8(4), 355–364.CrossRefGoogle Scholar
  73. Tummarello, G., Oren, E., & Delbru, R. (2007). Sindice.com: Weaving the open linked data. In Proceedings of the 6th international semantic web conference and 2nd Asian semantic web conference (ISWC/ASWC2007) (vol. 4825, pp. 547–560). Busan, South Korea, LNCS. Berlin, Heidelberg: Springer.Google Scholar
  74. Vries, A. P., Vercoustre, A. M., Thom, J. A., Craswell, N., & Lalmas, M. (2008). Focused access to xml documents. Chap. Overview of the INEX 2007 entity ranking track, pp. 245–251. Berlin, Heidelberg: Springer. doi: 10.1007/978-3-540-85902-4_22.
  75. Wang, Q., Kamps, J., Ramirez Camps, G., Marx, M., Schuth, A., Theobald, M., et al. (2012). Overview of the INEX 2012 linked data track. In P. Forner, J. Karlgren & C. Womser-Hacker (Eds.), CLEF 2012 evaluation labs and workshop: Online working notes, pp. 1–13. Rome, Italy.Google Scholar
  76. Wei, W. (2009). Semantic search: Bringing semantic web technologies to information retrieval. Ph.D. thesis, University of Nottingham.Google Scholar
  77. Xing, W., & Ghorbani, A. A. (2004). Weighted pagerank algorithm. In CNSR, pp. 305–314. IEEE Computer Society.Google Scholar
  78. Xue, G. R., Yang, Q., Zeng, H. J., Yu, Y., & Chen, Z. (2005). Exploiting the hierarchical structure for link analysis. In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 186–193). New York, NY, USA: ACM Press.Google Scholar
  79. Yu, J. X., Qin, L., & Chang, L. (2010). Keyword search in relational databases: A survey. IEEE Data Engineering Bulletin, 33(1), 67–78.Google Scholar
  80. Zhu, X., Goldberg, A. B., Van, J., & Andrzejewski, G. D.(2007). Improving diversity in ranking using absorbing random walks. In Physics laboratory—University of Washington, pp. 97–104.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Antonio J. Roa-Valverde
    • 1
  • Miguel-Angel Sicilia
    • 1
  1. 1.STI InnsbruckInnsbruckAustria

Personalised recommendations