Advertisement

World Wide Web

, Volume 18, Issue 3, pp 491–520 | Cite as

DB-IR integration using tight-coupling in the Odysseus DBMS

  • Kyu-Young WhangEmail author
  • Jae-Gil Lee
  • Min-Jae Lee
  • Wook-Shin Han
  • Min-Soo Kim
  • Jun-Sung Kim
Article

Abstract

As many recent applications require integration of structured data and text data, unifying database (DB) and information retrieval (IR) technologies has become one of major challenges in our field. There have been active discussions on the system architecture for DB-IR integration, but a clear agreement has not been reached yet. Along this direction, we have advocated the use of the tight-coupling architecture and developed a novel structure of the IR index as well as tightly-coupled query processing algorithms. In tight-coupling, the text data type is supported from the storage system just like a built-in data type so that the query processor can efficiently handle queries involving both structured data and text data. In this paper, for archival purposes, we consolidate our achievements reported at non-regular publications over the last ten years or so, extending them by adding greater details on the IR index and the query processing algorithms. All the features in this paper are fully implemented in the Odysseus DBMS that has been under development at KAIST for over 23 years. We show that Odysseus significantly outperforms two open-source DBMSs and one open-source search engine (with some exceptional cases) in processing DB-IR integration queries. These results indeed demonstrate superiority of the tight-coupling architecture for DB-IR integration.

Keywords

Tight-coupling Information retrieval DB-IR integration Odysseus 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul, S., et al.: The Lowell database research self-assessment. Commun. ACM 48(5), 111–118 (2005)CrossRefGoogle Scholar
  2. 2.
    Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: a system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)Google Scholar
  3. 3.
    Agrawal, R., et al.: The Claremont report on database research. ACM SIGMOD Rec. 37(3), 9–19 (2008)CrossRefGoogle Scholar
  4. 4.
    Apache Lucene: http://lucene.apache.org/ (2013). Accessed 22 Nov 2013
  5. 5.
    Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press/Addison-Wesley (1999)Google Scholar
  6. 6.
    Baeza-Yates, R.A., Consens, M.P.: The continued saga of DB-IR integration. In: VLDB (2004) (a tutorial)Google Scholar
  7. 7.
    Banerjee, S., Krishnamurthy, V., Murthy, R.: All your data: the oracle extensibility architecture. Oracle White Paper. Oracle Corp. (1999)Google Scholar
  8. 8.
    Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, pp. 2670–2676 (2007)Google Scholar
  9. 9.
    Bast, H., Weber, I.: The completeSearch engine: interactive, efficient, and towards IR & DB integration. In: CIDR, pp. 88–95 (2007)Google Scholar
  10. 10.
    Bast, H., Chitea, A., Suchanek, F.M., Weber, I.: ESTER: efficient search on text, entities, and relations. In: SIGIR, pp. 671–678 (2007)Google Scholar
  11. 11.
    Biliris, A.: The performance three database storage structures for managing large objects. In: SIGMOD, pp. 276–285 (1992)Google Scholar
  12. 12.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW, pp. 107–117 (1998)Google Scholar
  13. 13.
    Chaudhuri, S., Ramakrishnan, R., Weikum, G.: Integrating DB and IR technologies: what is the sound of one hand clapping. In: CIDR, pp. 1–12 (2005)Google Scholar
  14. 14.
    Chen, W., Chow, J., Fuh, Y., Grandbois, J., Jou, M., Mattos, N.M., Tran, B.T., Wang, Y.: High level indexing of user-defined types. In: VLDB, pp. 554–564 (1999)Google Scholar
  15. 15.
    Cheng, T., Chang, K.C.-C.: Beyond pages: supporting efficient, scalable entity search with dual-inversion index. In: EDBT, pp. 15–26 (2010)Google Scholar
  16. 16.
    Cornacchia, R., Heman, S., Zukowski, M., de Vries, A.P., Boncz, P.A.: Flexible and efficient IR using array databases. VLDB J. 17(1), 151–168 (2008)CrossRefGoogle Scholar
  17. 17.
    DeRose, P., Shen, W., Chen, F., Doan, A., Ramakrishnan, R.: Building structured web community portals: a top-down, compositional, and incremental approach. In: VLDB, pp. 399–410 (2007)Google Scholar
  18. 18.
    DeFazio, S., Daoud, A.M., Smith, L.A., Srinivasan, J., Croft, W.B., Callan, J.P.: Integrating IR and RDBMS using cooperative indexing. In: SIGIR, pp. 84–92 (1995)Google Scholar
  19. 19.
    Ewald, G., Hans-Jurgen, S.: PostgreSQL developer’s handbook. Sams Publishing (2001)Google Scholar
  20. 20.
    Full-Text Search in PostgreSQL: http://www.postgresql.org/docs/8.3/static/textsearch.html (2013). Accessed 22 Nov 2013
  21. 21.
    Fuh, Y., Deßloch, S., Chen, W., Mattos, N., Tran, B., Lindsay, B., DeMichel, L., Rielau, S., Mannhaupt, D.: Implementation of SQL3 structured types with inheritance and value substitutability. In: VLDB, pp. 565–574 (1999)Google Scholar
  22. 22.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: SIGMOD, pp. 16–27 (2003)Google Scholar
  23. 23.
    Halverson, A., Burger, J., Galanis, L., Kini, A., Krishnamurthy, R., Rao, A.N., Tian, F., Viglas S., Wang, Y., Naughton, J.F., DeWitt, D.J.: Mixed mode XML query processing. In: VLDB, pp. 225–236 (2003)Google Scholar
  24. 24.
    Heman, S., Zukowski, M., de Vries, A.P., Boncz, P.A.: Efficient and flexible information retrieval using MonetDB/X100. In: CIDR, pp. 96–101 (2007)Google Scholar
  25. 25.
    Hristidis, V., Papakonstantinou, Y.: DISCOVER: keyword search in relational databases. In: VLDB, pp. 670–681 (2002)Google Scholar
  26. 26.
    IBM: DB2 UDB Text Extender Administration and Programming Version 8 (2003)Google Scholar
  27. 27.
    Lentz, A.: MySQL Storage Engine Architecture. MySQL Developer Articles. MySQL AB (2004) (available from http://dev.mysql.com/tech-resources/articles). Accessed 22 Nov 2013
  28. 28.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)Google Scholar
  29. 29.
    McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action, 2nd edn. Manning Publications (2010)Google Scholar
  30. 30.
    Oracle: Oracle Data Cartridge Developer’s Guide 11g Release 1 (2008)Google Scholar
  31. 31.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)Google Scholar
  32. 32.
    Theobald, M., et al.: TopX: Efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1), 81–115 (2008)CrossRefGoogle Scholar
  33. 33.
    Tsearch2—Full Text Extension for PostgreSQL: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (2013). Accessed 22 Nov 2013
  34. 34.
    Weikum, G.: DB&IR: both sides now. In: SIGMOD, pp. 25–30 (2007)Google Scholar
  35. 35.
    Whang, K., Krishnamurthy, R.: The multilevel grid file—a dynamic hierarchical multidimensional file structure. In: DASFAA, pp. 449–459 (1991)Google Scholar
  36. 36.
    Whang, K., Park, B., Han, W., Lee, Y.: An inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems. U.S. Patent No. 6,349,308 (2002) (Appl. No. 09/250,487 (1999))Google Scholar
  37. 37.
    Whang, K.: Tight-coupling: A way of building high-performance application specific engines. DASFAA (2003) (presented at the panel session, available on-line from http://www.dasfaa.org/dasfaa2003/file/Prof_Kyu-Young_Whang_5.pdf). Accessed 22 Nov 2013
  38. 38.
    Whang, K., Lee, M., Lee, J., Kim, M., Han, W.: Odysseus: a high-performance ORDBMS tightly-coupled with IR features. In: ICDE, pp. 1104–1105 (2005) (this paper received the Best Demonstration Award)Google Scholar
  39. 39.
    Whang, K.: A new DBMS architecture for DB-IR integration. In: APWeb/WAIM, pp. 4–5 (2007) (a keynote presentation)Google Scholar
  40. 40.
    Whang, K.: DB-IR integration and its application to a massively-parallel search engine. In: CIKM, pp. 1–2 (2009) (a keynote presentation)Google Scholar
  41. 41.
    Whang, K., Lee, J., Kim, M., Lee, M., Lee, K., Han, W., Kim, J.: Tightly-coupled spatial database features in the Odysseus/OpenGIS DBMS for high-performance. GeoInformatica 14(4), 425–446 (2010)CrossRefGoogle Scholar
  42. 42.
    Whang, K., Yun, T., Yeo, Y., Song, I., Kwon, H., and Kim, I.: ODYS: an approach to building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS for higher-level functionality. In: SIGMOD, pp. 313–324 (2013)Google Scholar
  43. 43.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishers (1999)Google Scholar
  44. 44.
    Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), 1–56 (2006)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Kyu-Young Whang
    • 1
    Email author
  • Jae-Gil Lee
    • 2
  • Min-Jae Lee
    • 1
  • Wook-Shin Han
    • 3
  • Min-Soo Kim
    • 4
  • Jun-Sung Kim
    • 1
  1. 1.Department of Computer ScienceKorea Advanced Institute of Science and Technology (KAIST)DaejeonKorea
  2. 2.Department of Knowledge Service EngineeringKorea Advanced Institute of Science and Technology (KAIST)DaejeonKorea
  3. 3.Department of Creative IT Engineering/Department of Computer Science and EngineeringPohang University of Science and Technology (POSTECH)GyeongbukKorea
  4. 4.Department of Information and Communication EngineeringDaegu Gyeongbuk Institute of Science & Technology (DGIST)DaeguKorea

Personalised recommendations