The Data Warehouse of Newsgroups

  • Himanshu Gupta
  • Divesh Srivastava
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1540)

Abstract

Electronic newsgroups are one of the primary means for the dissemination, exchange and sharing of information. We argue that the current newsgroup model is unsatisfactory, especially when posted articles are relevant to multiple newsgroups. We demonstrate that considerable additional flexibility can be achieved by managing newsgroups in a data warehouse, where each article is a tuple of attribute-value pairs, and each newsgroup is a view on the set of all posted articles. Supporting this paradigm for a large set of newsgroups makes it imperative to efficiently support a very large number of views: this is the key difference between newsgroup data warehouses and conventional data warehouses. We identify two complementary problems concerning the design of such a newsgroup data warehouse. An important design decision that the system needs to make is which newsgroup views to eagerly maintain (i.e., materialize). We demonstrate the intractability of the general newsgroupselection problem, consider various natural special cases of the problem, and present efficient exact/approximation algorithms and complexity hardness results for them. A second important task concerns the efficient incremental maintenance of the eagerly maintained newsgroups. The newsgroup-maintenance problem for our model of newsgroup definitions is a more general version of the classical point-location problem, and we design an I/O and CPU efficient algorithm for this problem.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ABSS93.
    S. Arora, L. Babai, J. Stern, and Z. Sweedyk. The hardness of approximate optima in lattices, codes, and systems of linear equations. In Proceedings of the Foundations of Computer Science, 1993.Google Scholar
  2. AL95.
    S. Arora and C. Lund. Hardness of approximations. Technical Report TR-504-95, Princeton University, Computer Science Department, 1995.Google Scholar
  3. Cha83.
    B. Chazelle. Filtering search: A new approach to query-answering. In Proceeding of the Foundations of Computer Science, 1983.Google Scholar
  4. Chv79.
    V. Chvatal. A greedy heuristic for the set covering problem. Mathematics of Operations Research, 4(3):233–235, 1979.MATHMathSciNetCrossRefGoogle Scholar
  5. Ede83a.
    H. Edelsbrunner. A new approach to rectangle intersections, Part I. International Journal of Computer Mathematics, 13:209–219, 1983.MATHCrossRefMathSciNetGoogle Scholar
  6. Ede83b.
    H. Edelsbrunner. A new approach to rectangle intersections, Part II. International Journal of Computer Mathematics, 13:221–229, 1983.MATHCrossRefMathSciNetGoogle Scholar
  7. EM81.
    H. Edelsbrunner and H.A. Maurer. On the intersection of orthogonal objects. Information Processing Letters, 13(4):177–180, 1981.CrossRefMathSciNetGoogle Scholar
  8. Fal85.
    C. Faloutsos. Access methods for text. ACM Comp. Surveys, 17(1), 1985.Google Scholar
  9. FPT81.
    R. J. Fowler, M. S. Paterson, and S. L. Tanimoto. Optimal packing and covering in the plane are NP-complete. Info. Proc. Letters, 12(3), 1981.Google Scholar
  10. Fre60.
    E. Fredkin. Trie memory. Communications of the ACM, 3(9), 1960.Google Scholar
  11. GJ79.
    M. R. Garey, D. J. Johnson. Computers and Intractability: a Guide to the Theory of NP-Completeness, Freeman, San Francisco, 1979.MATHGoogle Scholar
  12. GHRU97.
    H. Gupta, V. Harinarayan, A. Rajaraman, and J. Ullman. Index selection in OLAP. In Proceedings of the ICDE, 1997.Google Scholar
  13. GM97.
    M. Goldwasser and R. Motwani. Intractability of assembly sequencing: Unit disks in the plane. In Proceeding of the Workshop on Algorithms and Data Structures, August 1997.Google Scholar
  14. Gup97.
    H. Gupta. Selection of views to materialize in a data warehouse. In Proceedings of the ICDT, Delphi, Greece., January 1997.Google Scholar
  15. HCKW90.
    E. Hanson, M. Chaabouni, C.-H. Kim, and Y.-W. Wang. A predicate matching algorithm for database rule systems. In PODS, 1990.Google Scholar
  16. HGMW+95._J. Hammer, H. Garcia-Molina, J. Widom, W. Labio, and Y. Zhuge. The Stanford Data Warehousing Project. IEEE Data Engineering Bulletin, Special Issue on Materialized Views and Data Warehousing, 18(2), 1995.Google Scholar
  17. HRU96.
    V. Harinarayan, A. Rajaraman, and J. Ullman. Implementing data cubes efficiently. In SIGMOD, 1996.Google Scholar
  18. IK93.
    W.H. Inmon and C. Kelley. Rdb/VMS: Developing the Data Warehouse. QED Publishing Group, Boston, Massachusetts, 1993.Google Scholar
  19. KRVV93.
    P.C. Kanellakis, S. Ramaswamy, D.E. Vengroff, and J.S. Vitter. Indexing for data models with constraints and classes. In PODS, 1993.Google Scholar
  20. Per94.
    M. Persin. Document filtering for fast ranking. Proc. ACM SIGIR Conf., Dublin, Ireland, 1994.Google Scholar
  21. RS94.
    S. Ramaswamy and S. Subramanian. Path caching: A technique in optimal external searching. In PODS, 1994.Google Scholar
  22. RS95.
    S. Ramaswamy and S. Subramanian. The p-range tree: A new data structure for range searching in secondary memory. In SODA, 1995.Google Scholar
  23. SB88.
    G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing &Management, 24(5), 1988.Google Scholar
  24. Sam89a.
    Hanan Samet. Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. Addison-Wesley, 1989.Google Scholar
  25. Sam89b.
    Hanan Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1989.Google Scholar
  26. VV96.
    D.E. Vengroff and J.S. Vitter. Efficient 3-D searching in external memory. In Proceeding of the STOC, 1996.Google Scholar
  27. Wid95.
    J. Widom. Research problems in data warehousing. In Proceedings of the Conference on Info. and Knowledge Management, 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Himanshu Gupta
    • 1
  • Divesh Srivastava
    • 2
  1. 1.Stanford UniversityStanfordUSA
  2. 2.AT&T Labs - ResearchNJUSA

Personalised recommendations