Abstract
The Semantic Web is evolving into a property-linked web of data, conceptually different from but contained in the Web of hyperlinked documents. Data Retrieval techniques are typically used to retrieve data from the Semantic Web while Information Retrieval techniques are used to retrieve documents from the Hypertext Web. We present a Unified Web model that integrates the two webs and formalizes connection between them. We then present an approach to retrieving documents and data that captures best of both the worlds. Specifically, it improves recall for legacy documents and provides keyword-based search capability for the Semantic Web. We specify the Hybrid Query Language that embodies this approach, and the prototype system SITAR that implements it. We conclude with areas of future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Semantic Web Activity page, http://www.w3.org/2001/sw/
Prud’hommeaux, E., Seaborne, A. (eds.): SPARQL Query Language for RDF, [W3C WD] (October 2006), http://www.w3.org/TR/rdf-sparql-query/
Adida, B., Birbeck, M. (eds.): “RDFa,” [W3C WD] (2006), http://www.w3.org/TR/xhtml-rdfa-primer/
Immaneni, T., Thirunarayan, K.: Hybrid Retrieval from the Unified Web. In: Proceedings of the 22nd ACM Symposium on Applied Computing, Semantic Web and Applications Track (ACM SAC 2007), pp. 1376–1380 (March 2007)
Immaneni, T., Thirunarayan, K.: A Unified approach To Retrieving Web Documents and Semantic Web Data. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 579–593. Springer, Heidelberg (2007)
Immaneni, T.: A Hybrid Approach to Retrieving Web Documents and Semantic Web Data. Doctoral Dissertation, Department of Computer Science and Engineering, Wright State University, Dayton, OH (October 2007)
Thirunarayan, K.: On Embedding Machine-Processable Semantics into Documents. IEEE Transactions on Knowledge and Data Engineering 17(7), 1014–1018 (2005)
Periodic Table in OWL, http://www.daml.org/2003/01/periodictable/
Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (1998)
Guha, R., McCool, R., Miller, E.: “Semantic Search”. In: Proceedings of the 12th International Conference on World Wide Web, May 2003, pp. 700–709. ACM Press, New York (2003)
Apache Lucene, http://lucene.apache.org/
Hartmann, J., Sure, Y.: An Infrastructure for Scalable, Reliable Semantic Portals. IEEE Intelligent Systems 19(3), 58–65 (2004)
CyberNeko HTML Parser, http://people.apache.org/~andyc/neko/doc/html/
Jena ARP, http://www.hpl.hp.com/personal/jjc/arp/
Beckett, D.: SWAD-E Deliverable 10.2: Mapping Semantic Web Data with RDBMSes (2003), http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/
Beckett, D.: SWAD-Europe Deliverable 10.1: Scalability and Storage: Survey of Free Software / Open Source RDF storage systems (2002), http://www.w3.org/2001/sw/Europe/reports/rdf_scalable_storage_report/
Bailey, J., Bry, F., Furche, T., Schaffert, S.: Web and Semantic Web Query Lan-guages: A Survey. In: Eisinger, N., Maluszynski, J. (eds.) Reasoning Web. LNCS, vol. 3564, pp. 35–133. Springer, Heidelberg (2005)
Haase, P., Broekstra, J., Egerhart, A., Volz, R.: A Comparison of RDF Query Langauges. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 502–517. Springer, Heidelberg (2004)
Davies, J., Weeks, R., Krohn, U.: QuizRDF: Search Technology for the Semantic Web. In: Workshop on Real World RDF and Semantic Web Applications, Proceedings of WWW 2002, Hawaii, USA (2002)
Mayfield, J., Finin, T.: Information Retrieval on the Semantic Web: Integrating Inference and Retrieval. In: Proceedings of the SIGIR 2003 Semantic Web Workshop, pp. 461–468 (2003)
Ding, L., et al.: Finding and Ranking Knowledge on the Semantic Web. In: Proceedings of the 4th International Semantic Web Conference, November 2005, pp. 156–170 (2005)
Rocha, C., Schwabe, D., Aragao, M.P.: A Hybrid Approach for Searching in the Semantic Web. In: Proceedings of the 13th International World Wide Web Conference, New York, May 2004, pp. 374–383 (2004)
Zhang, L., Yu, Y., Zhou, J., Lin, C., Yang, Y.: An Enhanced Model for Searching in Semantic Portals. In: Proceedings of the 14th International World Wide Web Conference, May 2005, pp. 453–462. ACM Press, Chiba (2005)
Vallet, D., Fernández, M., Castells, P.: An Ontology-Based Information Retrieval Model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005)
Bhagdev, R., Chapman, S., Ciravegna, F., Lanfranchi, V., Petrelli, D.: Hybrid Search: Effectively Combining Keywords and Semantic Searches. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 554–568. Springer, Heidelberg (2008)
Hausenblas, M., Herman, I., Adida, B.: RDFa—Bridging the Web of Documents and the Web of Data. Tutorial given at: The 7th International Semantic Web Conference, Karlsruhe, Germany (October 2008)
Jena/Jena2, http://jena.sourceforge.net/
Redland, http://librdf.org/
Sesame, http://www.openrdf.org/
KOAN/KOAN2, http://kaon2.semanticweb.org/
rdfDB, http://www.guha.com/rdfdb/
RDFStore, http://rdfstore.sourceforge.net/
Kowari, http://www.kowari.org/
Thirunarayan, K., Immaneni, T.: Hybrid Retrieval of Hypertext Web Documents and Semantic Web Data (submitted to Journal)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 107–113 (2008)
Hadoop, http://hadoop.apache.org/core/
RDFa vs Microformats, http://evan.prodromou.name/RDFa_vs_microformats
GRDDL, http://www.w3.org/TR/grddl/
Thirunarayan, K., Verma, R.: A Framework for Trust and Distrust Networks. In: Proceedings of Web 2.0 Trust Workshop (W2Trust) (June 2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Thirunarayan, K., Immaneni, T. (2009). Integrated Retrieval from Web of Documents and Data. In: Ras, Z.W., Dardzinska, A. (eds) Advances in Data Management. Studies in Computational Intelligence, vol 223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02190-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-02190-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02189-3
Online ISBN: 978-3-642-02190-9
eBook Packages: EngineeringEngineering (R0)