Skip to main content

Integrated Retrieval from Web of Documents and Data

  • Chapter
Advances in Data Management

Abstract

The Semantic Web is evolving into a property-linked web of data, conceptually different from but contained in the Web of hyperlinked documents. Data Retrieval techniques are typically used to retrieve data from the Semantic Web while Information Retrieval techniques are used to retrieve documents from the Hypertext Web. We present a Unified Web model that integrates the two webs and formalizes connection between them. We then present an approach to retrieving documents and data that captures best of both the worlds. Specifically, it improves recall for legacy documents and provides keyword-based search capability for the Semantic Web. We specify the Hybrid Query Language that embodies this approach, and the prototype system SITAR that implements it. We conclude with areas of future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Semantic Web Activity page, http://www.w3.org/2001/sw/

  2. Prud’hommeaux, E., Seaborne, A. (eds.): SPARQL Query Language for RDF, [W3C WD] (October 2006), http://www.w3.org/TR/rdf-sparql-query/

  3. Adida, B., Birbeck, M. (eds.): “RDFa,” [W3C WD] (2006), http://www.w3.org/TR/xhtml-rdfa-primer/

  4. Immaneni, T., Thirunarayan, K.: Hybrid Retrieval from the Unified Web. In: Proceedings of the 22nd ACM Symposium on Applied Computing, Semantic Web and Applications Track (ACM SAC 2007), pp. 1376–1380 (March 2007)

    Google Scholar 

  5. Immaneni, T., Thirunarayan, K.: A Unified approach To Retrieving Web Documents and Semantic Web Data. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 579–593. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Immaneni, T.: A Hybrid Approach to Retrieving Web Documents and Semantic Web Data. Doctoral Dissertation, Department of Computer Science and Engineering, Wright State University, Dayton, OH (October 2007)

    Google Scholar 

  7. Thirunarayan, K.: On Embedding Machine-Processable Semantics into Documents. IEEE Transactions on Knowledge and Data Engineering 17(7), 1014–1018 (2005)

    Article  Google Scholar 

  8. Periodic Table in OWL, http://www.daml.org/2003/01/periodictable/

  9. Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (1998)

    Google Scholar 

  10. Guha, R., McCool, R., Miller, E.: “Semantic Search”. In: Proceedings of the 12th International Conference on World Wide Web, May 2003, pp. 700–709. ACM Press, New York (2003)

    Google Scholar 

  11. Apache Lucene, http://lucene.apache.org/

  12. Hartmann, J., Sure, Y.: An Infrastructure for Scalable, Reliable Semantic Portals. IEEE Intelligent Systems 19(3), 58–65 (2004)

    Article  Google Scholar 

  13. CyberNeko HTML Parser, http://people.apache.org/~andyc/neko/doc/html/

  14. Jena ARP, http://www.hpl.hp.com/personal/jjc/arp/

  15. Beckett, D.: SWAD-E Deliverable 10.2: Mapping Semantic Web Data with RDBMSes (2003), http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/

  16. Beckett, D.: SWAD-Europe Deliverable 10.1: Scalability and Storage: Survey of Free Software / Open Source RDF storage systems (2002), http://www.w3.org/2001/sw/Europe/reports/rdf_scalable_storage_report/

  17. Bailey, J., Bry, F., Furche, T., Schaffert, S.: Web and Semantic Web Query Lan-guages: A Survey. In: Eisinger, N., Maluszynski, J. (eds.) Reasoning Web. LNCS, vol. 3564, pp. 35–133. Springer, Heidelberg (2005)

    Google Scholar 

  18. Haase, P., Broekstra, J., Egerhart, A., Volz, R.: A Comparison of RDF Query Langauges. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 502–517. Springer, Heidelberg (2004)

    Google Scholar 

  19. Davies, J., Weeks, R., Krohn, U.: QuizRDF: Search Technology for the Semantic Web. In: Workshop on Real World RDF and Semantic Web Applications, Proceedings of WWW 2002, Hawaii, USA (2002)

    Google Scholar 

  20. Mayfield, J., Finin, T.: Information Retrieval on the Semantic Web: Integrating Inference and Retrieval. In: Proceedings of the SIGIR 2003 Semantic Web Workshop, pp. 461–468 (2003)

    Google Scholar 

  21. Ding, L., et al.: Finding and Ranking Knowledge on the Semantic Web. In: Proceedings of the 4th International Semantic Web Conference, November 2005, pp. 156–170 (2005)

    Google Scholar 

  22. Rocha, C., Schwabe, D., Aragao, M.P.: A Hybrid Approach for Searching in the Semantic Web. In: Proceedings of the 13th International World Wide Web Conference, New York, May 2004, pp. 374–383 (2004)

    Google Scholar 

  23. Zhang, L., Yu, Y., Zhou, J., Lin, C., Yang, Y.: An Enhanced Model for Searching in Semantic Portals. In: Proceedings of the 14th International World Wide Web Conference, May 2005, pp. 453–462. ACM Press, Chiba (2005)

    Chapter  Google Scholar 

  24. Vallet, D., Fernández, M., Castells, P.: An Ontology-Based Information Retrieval Model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005)

    Google Scholar 

  25. Bhagdev, R., Chapman, S., Ciravegna, F., Lanfranchi, V., Petrelli, D.: Hybrid Search: Effectively Combining Keywords and Semantic Searches. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 554–568. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  26. Hausenblas, M., Herman, I., Adida, B.: RDFa—Bridging the Web of Documents and the Web of Data. Tutorial given at: The 7th International Semantic Web Conference, Karlsruhe, Germany (October 2008)

    Google Scholar 

  27. Jena/Jena2, http://jena.sourceforge.net/

  28. Redland, http://librdf.org/

  29. Sesame, http://www.openrdf.org/

  30. KOAN/KOAN2, http://kaon2.semanticweb.org/

  31. rdfDB, http://www.guha.com/rdfdb/

  32. RDFStore, http://rdfstore.sourceforge.net/

  33. Kowari, http://www.kowari.org/

  34. Boca, http://ibm-slrp.sourceforge.net/

  35. BRAHMS, http://lsdis.cs.uga.edu/projects/semdis/brahms/

  36. Thirunarayan, K., Immaneni, T.: Hybrid Retrieval of Hypertext Web Documents and Semantic Web Data (submitted to Journal)

    Google Scholar 

  37. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  38. Hadoop, http://hadoop.apache.org/core/

  39. RDFa vs Microformats, http://evan.prodromou.name/RDFa_vs_microformats

  40. GRDDL, http://www.w3.org/TR/grddl/

  41. Thirunarayan, K., Verma, R.: A Framework for Trust and Distrust Networks. In: Proceedings of Web 2.0 Trust Workshop (W2Trust) (June 2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Thirunarayan, K., Immaneni, T. (2009). Integrated Retrieval from Web of Documents and Data. In: Ras, Z.W., Dardzinska, A. (eds) Advances in Data Management. Studies in Computational Intelligence, vol 223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02190-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02190-9_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02189-3

  • Online ISBN: 978-3-642-02190-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics