Skip to main content

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 5740))

  • 738 Accesses

Abstract

Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. Academicians have focused on this problem but there still remain a lot of obstacles for its widespread use in practice. One of the key problems is the absence of schema in unstructured text. In this paper we present a new paradigm for integrating information which overcomes this problem – that of Context Oriented Information Integration. The goal is to integrate unstructured data with the structured data present in the enterprise and use the extracted information to generate actionable insights for the enterprise. We present two techniques which enable context oriented information integration and show how they can be used for solving real world problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Doan, A., Naughton, J.F., Ramakrishnan, R., Baid, A., Chai, X., Chen, F., Chen, T., Chu, E., DeRose, P., Gao, B., Gokhale, C., Huang, J., Shen, W., Vuong, B.: Information extraction challenges in managing unstructured data. SIGMOD Rec. 37(4), 14–20 (2009)

    Article  Google Scholar 

  2. Bruce, H., Halevy, A., Jones, W., Pratt, W., Shapiro, L., Suciu, D.: Information retrieval and databases: Synergies and syntheses (2003), http://www2.cs.washington.edu/nsf2003

  3. Hamilton, J., Nayak, T.: Microsoft SQL Server Full-Text Search. IEEE Data Engg. Bull. 24(4) (2001)

    Google Scholar 

  4. Jhingran, A., Mattos, N., Pirahesh, H.: Information integration: A research agenda. IBM Sys. J. 41(4) (2002)

    Google Scholar 

  5. Dixon, P.: Basics of Oracle Text Retrieval. IEEE Data Engg. Bull. 24(4) (2001)

    Google Scholar 

  6. Maier, A., Simmen, D.: DB2 Optimization in Support of Full Text Search. IEEE Data Engg. Bull. 24(4) (2001)

    Google Scholar 

  7. Somani, A., Choy, D., Kleewein, J.C.: Bringing together content and data management: Challenges and opportunities. IBM Sys. J. 41(4) (2002)

    Google Scholar 

  8. Raghavan, P.: Structured and unstructured search in enterprises. IEEE Data Engg. Bull. 24(4) (2001)

    Google Scholar 

  9. Goldman, R., Widom, J.: WSQ/DSQ: A Practical Approach for Combined Querying of Databases and the Web. In: SIGMOD (2000)

    Google Scholar 

  10. Maier, A., Simmen, D.: DB2 Optimization in Support of Full Text Search. IEEE Data Engg. Bull. 24(4) (2001)

    Google Scholar 

  11. Roy, P., Mohania, M.K., Bamba, B., Raman, S.: Towards automatic association of relevant unstructured content with structured query results. In: CIKM 2005 (2005)

    Google Scholar 

  12. Chakaravarthy, V.T., Gupta, H., Roy, P., Mohania, M.K.: Efficiently Linking Text Documents with Relevant Structured Information. In: VLDB 2006 (2006)

    Google Scholar 

  13. Sarawagi, S.: Automation in information extraction and integration (tutorial). In: VLDB (2002)

    Google Scholar 

  14. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley/ACM (1999)

    Google Scholar 

  15. Chakrabarti, S.: Breaking through the syntax barrier: Searching with entities and relations. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 9–16. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  16. Voorhees, E., Tice, D.: The TREC-8 question answering track evaluation. In: Proc. Eighth Text Retrieval Conference, TREC-8 (1999)

    Google Scholar 

  17. Walker, M.H., Eaton, N.J.: Microsoft Office Visio 2003 Inside Out. Microsoft Press (2003)

    Google Scholar 

  18. Barsalou, T., Keller, A.M., Siambela, N., Wiederhold, G.: Updating relational databases through object-based views. In: SIGMOD (1991)

    Google Scholar 

  19. Barsalou, T.: View objects for relational databases. Tech. Rep. STAN-CS-90-1310, CS Dept., Stanford University, Ph.D. thesis (1990)

    Google Scholar 

  20. Premerlani, W.J., Blaha, M.R.: An Approach for Reverse Engineering of Relational Databases. CACM 37(5) (1994)

    Google Scholar 

  21. Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: SIGKDD (2004)

    Google Scholar 

  22. Li, X., Morie, P., Roth, D.: Semantic Integration in Text: From Ambiguous Names to Identifiable Entities. AI Magazine: Special Issue on Semantic Integration (2005)

    Google Scholar 

  23. Poosala, V.: Histogram-based estimation techniques in database systems. PhD thesis, University of Wisconsin, Madison, WI, USA (1997)

    Google Scholar 

  24. Chen, P.P.-S.: The Entity-Relationship Model–Toward a Unified View of Data. ACM TODS 1(1) (1976)

    Google Scholar 

  25. Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational databases for querying XML documents: Limitations and opportunities. In: VLDB (1999)

    Google Scholar 

  26. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley/ACM (1999)

    Google Scholar 

  27. IBM. IBM DB2 UDB Net Search Extender : Administration and User Guide (version 8.1) (2003)

    Google Scholar 

  28. Business case for content scorecarding, http://www.analyticstrategy.com/research/Content%20Scorecarding%20Business%20Case.pdf

  29. Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, S.S.: BANKS: Browsing and Keyword Searching in Relational Databases. In: VLDB 2002, pp. 1083–1086 (2002)

    Google Scholar 

  30. Roy, S.B., Wang, H., Das, G., Nambiar, U., Mohania, M.K.: Minimum-effort driven dynamic faceted search in structured databases. In: CIKM 2008, pp. 13–22 (2008)

    Google Scholar 

  31. Call Center use Survey, http://www.incoming.com/statistics/performance.aspx

  32. Soltau, H., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Zweig, G.: The IBM 2004 Coversational Telephony System for Rich Transcription. In: IEEE ICASSP (March 2005)

    Google Scholar 

  33. Wikipedia. Sanitization (classified information) — wikipedia, the free encyclopedia (2006)

    Google Scholar 

  34. U.S. Department of Energy. Department of energy researches use of advanced computing for document declassification, http://www.osti.gov/opennet

  35. Agichtein, E., Gravano, L., Pavel, J., Sokolova, V., Voskoboynik, A.: Snowball: A prototype system for extracting relations from large text collections. In: SIGMOD (2001)

    Google Scholar 

  36. Douglass, M.M., Clifford, G.D., Reisner, A., Long, W.J., Moody, G.B., Mark, R.G.: De-identification algorithm for free-text nursing notes. Computers in Cardiology (2005)

    Google Scholar 

  37. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient full domain k-anonymity. In: SIGMOD (2005)

    Google Scholar 

  38. Sweeney, L.: Replacing personally-identifying information in medical records, the srub system. Journal of the Americal Medical Informatics Association (1996)

    Google Scholar 

  39. Sweeney, L.: K-anonymity: A model for protecting privacy. Intl. Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5) (2002)

    Google Scholar 

  40. Tveit, A.: Anonymization of general practitioner medical records. In: HelsIT 2004, Trondheim, Norway (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mohania, M., Bhide, M., Roy, P., Chakaravarthy, V.T., Gupta, H. (2009). Context Oriented Information Integration. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. Lecture Notes in Computer Science, vol 5740. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03722-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03722-1_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03721-4

  • Online ISBN: 978-3-642-03722-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics