Abstract
Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. Academicians have focused on this problem but there still remain a lot of obstacles for its widespread use in practice. One of the key problems is the absence of schema in unstructured text. In this paper we present a new paradigm for integrating information which overcomes this problem – that of Context Oriented Information Integration. The goal is to integrate unstructured data with the structured data present in the enterprise and use the extracted information to generate actionable insights for the enterprise. We present two techniques which enable context oriented information integration and show how they can be used for solving real world problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Doan, A., Naughton, J.F., Ramakrishnan, R., Baid, A., Chai, X., Chen, F., Chen, T., Chu, E., DeRose, P., Gao, B., Gokhale, C., Huang, J., Shen, W., Vuong, B.: Information extraction challenges in managing unstructured data. SIGMOD Rec. 37(4), 14–20 (2009)
Bruce, H., Halevy, A., Jones, W., Pratt, W., Shapiro, L., Suciu, D.: Information retrieval and databases: Synergies and syntheses (2003), http://www2.cs.washington.edu/nsf2003
Hamilton, J., Nayak, T.: Microsoft SQL Server Full-Text Search. IEEE Data Engg. Bull. 24(4) (2001)
Jhingran, A., Mattos, N., Pirahesh, H.: Information integration: A research agenda. IBM Sys. J. 41(4) (2002)
Dixon, P.: Basics of Oracle Text Retrieval. IEEE Data Engg. Bull. 24(4) (2001)
Maier, A., Simmen, D.: DB2 Optimization in Support of Full Text Search. IEEE Data Engg. Bull. 24(4) (2001)
Somani, A., Choy, D., Kleewein, J.C.: Bringing together content and data management: Challenges and opportunities. IBM Sys. J. 41(4) (2002)
Raghavan, P.: Structured and unstructured search in enterprises. IEEE Data Engg. Bull. 24(4) (2001)
Goldman, R., Widom, J.: WSQ/DSQ: A Practical Approach for Combined Querying of Databases and the Web. In: SIGMOD (2000)
Maier, A., Simmen, D.: DB2 Optimization in Support of Full Text Search. IEEE Data Engg. Bull. 24(4) (2001)
Roy, P., Mohania, M.K., Bamba, B., Raman, S.: Towards automatic association of relevant unstructured content with structured query results. In: CIKM 2005 (2005)
Chakaravarthy, V.T., Gupta, H., Roy, P., Mohania, M.K.: Efficiently Linking Text Documents with Relevant Structured Information. In: VLDB 2006 (2006)
Sarawagi, S.: Automation in information extraction and integration (tutorial). In: VLDB (2002)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley/ACM (1999)
Chakrabarti, S.: Breaking through the syntax barrier: Searching with entities and relations. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 9–16. Springer, Heidelberg (2004)
Voorhees, E., Tice, D.: The TREC-8 question answering track evaluation. In: Proc. Eighth Text Retrieval Conference, TREC-8 (1999)
Walker, M.H., Eaton, N.J.: Microsoft Office Visio 2003 Inside Out. Microsoft Press (2003)
Barsalou, T., Keller, A.M., Siambela, N., Wiederhold, G.: Updating relational databases through object-based views. In: SIGMOD (1991)
Barsalou, T.: View objects for relational databases. Tech. Rep. STAN-CS-90-1310, CS Dept., Stanford University, Ph.D. thesis (1990)
Premerlani, W.J., Blaha, M.R.: An Approach for Reverse Engineering of Relational Databases. CACM 37(5) (1994)
Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: SIGKDD (2004)
Li, X., Morie, P., Roth, D.: Semantic Integration in Text: From Ambiguous Names to Identifiable Entities. AI Magazine: Special Issue on Semantic Integration (2005)
Poosala, V.: Histogram-based estimation techniques in database systems. PhD thesis, University of Wisconsin, Madison, WI, USA (1997)
Chen, P.P.-S.: The Entity-Relationship Model–Toward a Unified View of Data. ACM TODS 1(1) (1976)
Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational databases for querying XML documents: Limitations and opportunities. In: VLDB (1999)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley/ACM (1999)
IBM. IBM DB2 UDB Net Search Extender : Administration and User Guide (version 8.1) (2003)
Business case for content scorecarding, http://www.analyticstrategy.com/research/Content%20Scorecarding%20Business%20Case.pdf
Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, S.S.: BANKS: Browsing and Keyword Searching in Relational Databases. In: VLDB 2002, pp. 1083–1086 (2002)
Roy, S.B., Wang, H., Das, G., Nambiar, U., Mohania, M.K.: Minimum-effort driven dynamic faceted search in structured databases. In: CIKM 2008, pp. 13–22 (2008)
Call Center use Survey, http://www.incoming.com/statistics/performance.aspx
Soltau, H., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Zweig, G.: The IBM 2004 Coversational Telephony System for Rich Transcription. In: IEEE ICASSP (March 2005)
Wikipedia. Sanitization (classified information) — wikipedia, the free encyclopedia (2006)
U.S. Department of Energy. Department of energy researches use of advanced computing for document declassification, http://www.osti.gov/opennet
Agichtein, E., Gravano, L., Pavel, J., Sokolova, V., Voskoboynik, A.: Snowball: A prototype system for extracting relations from large text collections. In: SIGMOD (2001)
Douglass, M.M., Clifford, G.D., Reisner, A., Long, W.J., Moody, G.B., Mark, R.G.: De-identification algorithm for free-text nursing notes. Computers in Cardiology (2005)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient full domain k-anonymity. In: SIGMOD (2005)
Sweeney, L.: Replacing personally-identifying information in medical records, the srub system. Journal of the Americal Medical Informatics Association (1996)
Sweeney, L.: K-anonymity: A model for protecting privacy. Intl. Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5) (2002)
Tveit, A.: Anonymization of general practitioner medical records. In: HelsIT 2004, Trondheim, Norway (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Mohania, M., Bhide, M., Roy, P., Chakaravarthy, V.T., Gupta, H. (2009). Context Oriented Information Integration. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. Lecture Notes in Computer Science, vol 5740. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03722-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-03722-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03721-4
Online ISBN: 978-3-642-03722-1
eBook Packages: Computer ScienceComputer Science (R0)