Skip to main content

Data Integration and Access

The Digital Government Research Center’s Energy Data Collection (EDC) Project

  • Chapter
Advances in Digital Government

Abstract

This chapter describes the progress of the Digital Government Research Center in tackling the challenges of integrating and accessing the massive amount of statistical and text data available from government agencies. In particular, we address the issues of database heterogeneity, size, distribution, and control of terminology. In this chapter we provide an overview of our results in addressing problems such as (1) ontological mappings for terminology standardization, (2) data integration across data bases with high speed query processing, and (3) interfaces for query input and presentation of results. The DGRC is a collaboration between researchers from Columbia University and the Information Sciences Institute of the University of Southern California employing technology developed at both locations, in particular the SENSUS ontology, the SIMS multi-database access planner, the LEXING automated dictionary and terminology analysis system, the main-memory query processing component and others. The pilot application targets gasoline data from the Bureau of Labor Statistics, the Energy Information Administration of the Department of Energy, the Census Bureau, and other government agencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Arens, Y., C.A. Knoblock and C.-N. Hsu. 1996. Query Processing in the SIMS Information Mediator. In A. Tate (ed), Advanced Planning Technology. Menlo Park: AAAI Press.

    Google Scholar 

  • Ambite J.L. and C.A. Knoblock. 2000. Flexible and Scalable Cost-Based Query Planning in Mediators: A Transformational Approach. Artificial Intelligence Journal, 118(1–2).

    Google Scholar 

  • Ambite, J.L., Y. Arens, E. Hovy, A. Philpot, L. Gravano, V. Hatzivassiloglou, and J.L. Klavans. Simplifying Data Access: The Energy Data Collection Project. IEEE Computer 34(2), Special Issue on Digital Government, February 2001.

    Google Scholar 

  • Ambite, J. L., C. Shahabi, R. R. Schmidt, and A. Philpot. Fast Approximate Evaluation of OLAP Queries for Integrated Statistical Data. Proceedings of the First National Conference on Digital Government (dg.o 2001), Redondo Beach, May 2001.

    Google Scholar 

  • Byrd, R.J., B.K. Boguraev, J.L. Klavans and M.S. Neff. 1989. From Structural Analysis of Lexical Resources to Semantics in a Lexical Knowledge Base. U. Zernik (eds.) Proceedings of the First International Workshop on Lexical Acquisition. Detroit, Michigian.

    Google Scholar 

  • Evans, D., Klavans, J. and Wacholder, N. 2000. Document Processing with LinkIT. RIAO Paris, France, 1336–1345.

    Google Scholar 

  • Furnas, G. 1986. Generalized Fisheye Views. Proceedings of CHI 86. April 1986, pp. 16–23.

    Google Scholar 

  • Gupta, H. et al. 1997. Index Selection for OLAP. Proceedings of the 13th ICDE.

    Google Scholar 

  • Harinarayan, V., A. Rajaraman, and J. D. Ullman, 1996. Implementing Data Cubes Efficiently, Proceedings of the l996 ACMSIGMOD Conference.

    Google Scholar 

  • Hovy, E.H., A. Philpot, J.-L. Ambite, and U. Ramachandran. 2000. Automating the Placement of Database Concepts into a Large Ontology. In preparation.

    Google Scholar 

  • Hovy, E.H., A. Philpot, J.-L. Ambite, Y. Arens, J.L. Klavans, W. Bourne, and D. Sarioz. 2001. Data Acquisition and Integration in the DGRC’s Energy Data Collection Project. Proceedings of the dg.o 2001 Conference. Redondo Beach, California.

    Google Scholar 

  • Jacobsen, Lynn, D. Millman, and W. Bourne. 1994. Providing Access to a Data Library: SQL and Full-Text IR Methods of Automatically Generating Web Structure. Proceedings of the Second World Wide Web Conference’ 94: Mosaic and the Web.

    Google Scholar 

  • Klavans, J.L. and Muresan S. 2000. “DEFINDER: Rule-Based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text”. Proceedings of 2000 American Medical Informatics Association (AMIA) Annual Symposium, Los Angeles, California.

    Google Scholar 

  • Klavans, J. L and B. Whitman 2001 “Extracting Taxonomic Relationships from On-Line Definitional Sources Using LEXING”

    Google Scholar 

  • Knight, K. and S.K. Luk. 1994. Building a Large-Scale Knowledge Base for Machine Translation. Proceedings of the AAAI Conference.

    Google Scholar 

  • MacGregor, R. 1990. The Evolving Technology of Classification-Based Knowledge Representation Systems. In John Sowa (ed.), Principles of Semantic Networks: Explorations in the Representation of Knowledge. Morgan Kaufmann.

    Google Scholar 

  • Muslea, I. and S. Minton and C. A. Knoblock. 1998. Wrapper Induction for Semistructured Web-based Information Sources. Proceedings of the Conference on Automated Learning and Discovery. Pittsburgh, PA.

    Google Scholar 

  • Neff, Mary and Bran Boguraev. 1989. Dictionaries, dictionary grammars and dictionary entry parsing. Proceedings of the 27 th Meeting of the ACL. Vancouver, Canada.

    Google Scholar 

  • Ross, K. A. and K. A. Zaman. 2000. Serving Datacube Tuples from Main Memory. 12th International Conference on Scientific and StatisticalDatabase Management, pp. 182–195.

    Google Scholar 

  • Schmidt, R. R. and Shahabi, C. (2001a). Polap: A Fast Wavelet-based Technique for Progressive Evaluation of OLAP Queries. Submitted.

    Google Scholar 

  • Schmidt, R. R. and Shahabi, C. (2001b). Wavelet Based Density Estimators for Modeling OLAP Data Sets. In Third Workshop on Mining Scientific Datasets in conjunction with First SIAM Int’l Conference on Data Mining.

    Google Scholar 

  • Schorr H. and S. J. Stolfo, Towards the Digital Government of the 21st Century, Communications of the ACM, CACM, Nov. 1998.

    Google Scholar 

  • Shukla, A. and P. Deshpande and J. Naughton. 1998. Materialized View Selection for Multidimensional Datasets. Proceedings of the 24th International VLDB Conference.

    Google Scholar 

  • Swartout, W.R., R. Patil, K. Knight, and T. Russ. 1996. Toward Distributed Use of Large-Scale Ontologies. Proceedings of the 10th Knowledge Acquisition for Knowledge-Based Systems Workshop. Banff, Canada.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Kluwer Academic Publishers

About this chapter

Cite this chapter

Ambite, J.L. et al. (2002). Data Integration and Access. In: McIver, W.J., Elmagarmid, A.K. (eds) Advances in Digital Government. Advances in Database Systems, vol 26. Springer, Boston, MA. https://doi.org/10.1007/0-306-47374-7_5

Download citation

  • DOI: https://doi.org/10.1007/0-306-47374-7_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4020-7067-9

  • Online ISBN: 978-0-306-47374-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics