WIND: A warehouse for internet data

  • Lukas C. Faulstich
  • Myra Spiliopoulou
  • Volker Linnemann
Object Orientation and The Internet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1271)


The increasing amount of information available in the web demands sophisticated querying methods and knowledge discovery techniques. In this study, we introduce our architectural framework WIND for a data warehouse over a domain-specific thematic section of the Internet. The aim of WIND is to provide a partially materialized structured view of the underlying information sources, on which database querying can be applied and mining techniques can be developed. WIND loads web documents into several complementary local repositories like OODBMSs and text retrieval systems. This allows for a combination of attribute and content-oriented query processing. Special interest is paid to domain-specific document formats. To support conversion between (semi-)structured documents and database objects, we consider a technique for the generation of format converters based on the notion of object-grammars.


data warehouse web information retrieval format conversion grammars 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    K. Aberer, K. Böhm, and C. Hüser. The prospects of publishing using advanced database concepts. Electronic Publishing, 6(4):469–480, dec 1993.Google Scholar
  2. 2.
    S. Abiteboul, S. Cluet, and T. Milo. Querying and updating the file. In 19th VLDB Conf., volume 19, pages 73–85, 8 1993.Google Scholar
  3. 3.
    S. Abiteboul, S. Cluet, and T. Milo. A database interface for file update. In SIGMOD '95, pages 386–397, 1995.Google Scholar
  4. 4.
    S. Abiteboul, S. Cluet, and T. Milo. Correspondence and translation for heterogeneous data. In ICDT '97, number 1186 in LNCS, pages 351–363, 1997.Google Scholar
  5. 5.
    R. Cattell. The Object Database Standard, ODMG-93. Morgan Kaufmann, 1994.Google Scholar
  6. 6.
    S. Chaudhuri and L. Gravano. Optimizing queries over multimedia repositories. In SIGMOD'96, pages 91–102, Montreal, Canada, June 1996. ACM.Google Scholar
  7. 7.
    S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In Proc. of the 100th Anniv. Meeting, pages 7–18. Information Processing Society of Japan, 1994.Google Scholar
  8. 8.
    O. Etzioni. The World-Wide Web: Quagmire or gold mine? CACM, 39(11):65–68, Nov. 1996.Google Scholar
  9. 9.
    R. Fagin. Combining fuzzy informationm from multiple systems. In PODS'96, pages 216–226, Montreal, Canada, June 1996. ACM.Google Scholar
  10. 10.
    L. Faulstich, V. Linnemann, and M. Spiliopoulou. Using object-grammars for internet data warehousing. Technical report, Institut für Informationssysteme, Med. Universität Lübeck, 1997. Scholar
  11. 11.
    U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. The KDD process for extracting useful knowledge from volumes of data. CACM, 39(11):27–34, Nov. 1996.Google Scholar
  12. 12.
    A. Feng and T. Wakayama. SIMON: A grammar-based transformation system for structured documents. Electronic Publishing, 6(4):361–372, Dec. 1993.Google Scholar
  13. 13.
    W. Inmon. EIS and the data warehouse: a simple approach to building an effective foundation for EIS. Database Programming & Design, 5(11):70–73, nov 1992.Google Scholar
  14. 14.
    W. Inmon. The data warehouse and data mining. CACM, 39(11):49–50, Nov. 1996.Google Scholar
  15. 15.
    W. Inmon and C. Kelley. Rdb/VMS: Developing the Data Warehouse. QED Publishing Group, Boston, Massachusetts, 1993.Google Scholar
  16. 16.
    E. Kuikka and M. Penttonen. Transformation of structured documents with the use of grammar. Electronic Publishing, 6(4):373–383, Dec. 1993.Google Scholar
  17. 17.
    A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying Heterogeneous Information Sources Using Source Descriptions. In 22th VLDB Conf., pages 251–262, 1996.Google Scholar
  18. 18.
    J. Paakki. Attribute grammar paradigms: A high-level methodology in language implementation. ACM Computing Surveys, 27(2):196–255, June 1995.Google Scholar
  19. 19.
    U. Stutschka and V. Linnemann. Attributierte grammatiken als Werkzeug zur datenmodellierung. In G. Lausen, editor, BTW'95, pages 160–178, 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Lukas C. Faulstich
    • 1
  • Myra Spiliopoulou
    • 2
  • Volker Linnemann
    • 3
  1. 1.Institut für InformatikFreie Universität BerlinGermany
  2. 2.Institut für WirtschaftsinformatikHumboldt-Universität zu BerlinGermany
  3. 3.Institut für InformationssystemeMedizinische Universität zu LübeckGermany

Personalised recommendations