Skip to main content
Log in

Sequoia—An Approach to Declarative Information Retrieval

  • Schwerpunktbeitrag
  • Published:
Datenbank-Spektrum Aims and scope Submit manuscript

Abstract

In this work, we propose an approach that allows to query heterogeneous data sources on the Web in a declarative fashion. Such an approach gives means for a generic way to formulate various information needs, much more powerful than simple keyword queries. Particularly appealing is the ability to combine (join) information from different sources and the ability to compute simple statistics that can be used to select promising information pieces. What might sound like a hopeless effort due to the inherent complexity expressible by SQL-style queries is at second glance not complicated to understand and to use. Already very simple combinations (i.e., joins) of different data sources (i.e., tables) offer a surprisingly large set of interesting use cases. In particular, using sliding window joins that limit the scope of interest to recent information, obtained, for instance, from the live stream of Twitter Tweets. This goes far beyond keyword queries enriched with operators like allintext: or allintitle: or site:, as can be used, for instance, in the Google search engine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. While news sites and blogs where crawled continuously over this period of time, the time window in which Tweets where collected comprises only a couple of days in late July 2011.

References

  1. Ankolekar A, Krötzsch M, Tran T, Vrandecic D (2007) The two cultures: mashing up web 2.0 and the semantic web. In: Proceedings of the 16th international conference on world wide web (WWW’07). ACM, New York

    Google Scholar 

  2. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, 1st edn. Addison Wesley, Reading

    Google Scholar 

  3. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 30(1–7):107–117

    Google Scholar 

  4. Cafarella MJ, Halevy AY, Madhavan J (2011) Structured data on the web. Commun ACM 54:2

    Article  Google Scholar 

  5. Chidlovskii B, Borghoff UM (2000) Semantic caching of web queries. VLDB J 9:1

    Article  Google Scholar 

  6. Chiticariu L, Krishnamurthy R, Li Y, Raghavan S, Reiss F, Vaithyanathan S (2010) SystemT: an algebraic approach to declarative information extraction. In: ACL

    Google Scholar 

  7. de Virgilio R, Giunchiglia F, Tanca L (eds) (2010) Semantic web information management—a model driven perspective. Springer, Berlin

    Google Scholar 

  8. Garcia-Molina H, Ullman J, Widom J (2008) Database systems: the complete book, 2nd edn. Prentice Hall, New York

    Google Scholar 

  9. Geerts F, Kementsietsidis A, Milano D (2006) MONDRIAN: annotating and querying databases through colors and blocks. In: Proceedings of the 22nd international conference on data engineering, 2006 (ICDE’06). IEEE Press, Los Alamitos

    Google Scholar 

  10. Gravano L, Ipeirotis PG, Koudas N, Srivastava D (2003) Text joins in an RDBMS for web data integration. In: Proceedings of the 12th international conference on World Wide Web (WWW ’03). ACM, New York

    Google Scholar 

  11. He B, Chang KC-C (2003) Statistical schema matching across web query interfaces. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data (SIGMOD’03). ACM, New York

    Google Scholar 

  12. Lee MD, Welsh M (2005) An empirical evaluation of models of text document similarity. In: CogSci. Erlbaum, Hillsdale

    Google Scholar 

  13. Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

    MATH  Google Scholar 

  14. Marcus A, Bernstein M, Miller R, Madden S, Karger D, Bader O (2011) Tweets as data: demonstration of TweeQL and TwitInfo. In: SIGMOD. ACM, New York

    Google Scholar 

  15. Nehme RV, Rundensteiner EA, Bertino E (2009) Tagging stream data for rich real-time services. In: VLDB. VLDB Endowment

    Google Scholar 

  16. Peng F, Chawathe SS (2003) XPath queries on streaming data. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data (SIGMOD’03). ACM, New York

    Google Scholar 

  17. Yahoo! pipes, http://pipes.yahoo.com/pipes/

  18. Shahaf D, Guestrin C (2010) Connecting the dots between news articles. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’10). ACM, New York

    Google Scholar 

  19. Shanmugasundaram J, Tufte K, Zhang C, He G, DeWitt DJ, Naughton JF (1999) Relational databases for querying XML documents: limitations and opportunities. In: Proceedings of the 25th international conference on very large data bases (VLDB’99). San Mateo, Morgan Kaufmann

    Google Scholar 

  20. Stonebraker M, Rowe LA (1986) The design of POSTGRES. In: Proceedings of the 1986 ACM SIGMOD international conference on management of data (SIGMOD’86). ACM, New York

    Google Scholar 

  21. Traina C, Traina AJM, Vieira MR, Arantes AS, Faloutsos C (2006) Efficient processing of complex similarity queries in RDBMS through query rewriting. In: Proceedings of the 15th ACM international conference on Information and knowledge management (CIKM’06). ACM, New York

    Google Scholar 

  22. Wang DZ, Michelakis E, Franklin MJ, Garofalakis M, Hellerstein JM (2010) Probabilistic declarative information extraction. In: 2010 IEEE 26th international conference on data engineering (ICDE)

    Google Scholar 

  23. Wiesener S, Kowarschick W, Vogel P, Bayer R (1996) Semantic hypermedia retrieval in digital libraries. In: Digital libraries research and technology advances. Lecture notes in computer science, vol 1082. Springer, Berlin

    Chapter  Google Scholar 

  24. XML path language (W3C recommendation), http://www.w3.org/TR/xpath

  25. XQuery 1.0: An XML query language (W3C recommendation), http://www.w3.org/TR/xquery/

  26. Yahoo! query language (YQL), http://developer.yahoo.com/yql/

  27. Yu J, Benatallah B, Casati F, Daniel F (2008) Understanding mashup development. IEEE Internet Comput 12:44–52

    Article  Google Scholar 

Download references

Acknowledgements

This work has been supported by the Excellence Cluster on Multimodal Computing and Interaction (MMCI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Michel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pinkel, C., Alvanaki, F. & Michel, S. Sequoia—An Approach to Declarative Information Retrieval. Datenbank Spektrum 12, 101–108 (2012). https://doi.org/10.1007/s13222-012-0087-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13222-012-0087-5

Keywords

Navigation