Skip to main content

Part of the book series: Massive Computing ((MACO,volume 2))

Abstract

The explosive growth of digital repositories of information has been enabled by recent developments in communication and information technologies. The global Internet/World Wide Web exemplifies the rapid deployment of such technologies. Despite significant accomplishments in internetworking, however, scalable indexing and data-mining techniques for computational knowledge management lag behind the rapid growth of distributed collections. Hierarchical Distributed Dynamic Indexing (HDDI™) is an approach that dynamically creates a hierarchical index from distributed document collections. At each node of the hierarchical index, a knowledge base is created and subtopic regions of semantic locality in conceptual space are identified. This chapter introduces HDDI™, focusing on the model building techniques employed at each node of the hierarchy. A novel approach to information clustering based on the contextual transitivity of similarity between terms is introduced. We conclude with several example applications of HDDI™ in the textual data mining and information retrieval fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. V. Aho, J. E. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA.

    Google Scholar 

  2. R. Bader, M. Callahan, D. Grim, J. Krause, N. Miller and William M. Pottenger, The Role of the HDDI™ Collection Builder in Hierarchical Distributed Dynamic Indexing, Proceedings of the Textmine ‘01 Workshop, First SIAM International Conference on Data Mining, April.

    Google Scholar 

  3. R. Baeza-Yates and B. Ribeiro-Neto, Eds. Modern Information Retrieval, ACM Press, New York.

    Google Scholar 

  4. G. Blank, William M. Pottenger, G. D. Kessler, The CIMEL Project: Constructive, Collaborative, Inquiry-based Multimedia E-Learning, http://www.eecs.lehigh.edu/~cimel/~cimel.

  5. F. D. Bouskila, The Role of Semantic Locality in Hierarchical Distributed Dynamic Indexing and Information Retrieval, M.S. Thesis, Department of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign, December (Bouskila’s thesis work was supervised by William M. Pottenger).

    Google Scholar 

  6. F. D. Bouskila and William M. Pottenger, The Role of Semantic Locality in Hierarchical Distributed Dynamic Indexing, Proceedings of the International Conference on Artificial Intelligence (IC-AI’2000), Las Vegas, NV, June.

    Google Scholar 

  7. E. Brill, A Simple Rule-based Part of Speech Tagger, Proceedings of the Third Conference on Applied Natural Language Processing, ACL.

    Google Scholar 

  8. S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, April.

    Google Scholar 

  9. H. Chen and K. J. Lynch, Automatic Construction of Networks of Concepts Characterizing Document Databases, IEEE Transactions on Systems, Man and Cybernetics, 22(5):885–902, September/October.

    Google Scholar 

  10. H. Chen, J. Martinez, T. Ng and B. R. Schatz, A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System, Journal of the American Society for Information Science, Volume 48, Number 1, January.

    Google Scholar 

  11. G. Cooke, SemanTag, gcooke@rt66.com.

    Google Scholar 

  12. L. Karttunen, Directed Replacement. Proceedings of the 34 th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, California.

    Google Scholar 

  13. Y. B. Kim, The Role of Hierarchical Models in Hierarchical Distributed Dynamic Indexing, M.S. Thesis, Department of Computer Science at the University of Illinois at Urbana-Champaign, June.

    Google Scholar 

  14. S. Lawrence and C. L. Giles, Accessibility of Information on the Web, Nature, Volume 400, pages 107–109.

    Google Scholar 

  15. National Research Council, Computing the Future: A Broader Agenda for Computer Science and Engineering, National Academy Press.

    Google Scholar 

  16. William Morton Pottenger, Theory, Techniques, and Experiments in Solving Recurrences in Computer Programs, Ph.D. thesis, Center for Supercomputing Research and Development in the Department of Computer Science at the University of Illinois at Urbana-Champaign, www.eecs.lehigh.edu/~billp/pubs/PhDThesis.ps /~billp/pubs/PhDThesis.ps, May.

  17. William M. Pottenger, The Role of Associativity and Commutativity in the Detection and Transformation of Loop-Level Parallelism, In the Proceedings of the 12 th International Conference on Supercomputing, www.eecs.lehigh. edu/~billp/pubs/2057.ps.gz /~billp/pubs/2057.ps.gz, Melbourne, Australia, July.

  18. William M. Pottenger, Detecting Emerging Concepts in HDDI™. Proceedings of the Computational Information Retrieval Workshop (CIR00), Raleigh, NC. October.

    Google Scholar 

  19. William M. Pottenger, M. R. Callahan, and M. D. Padgett, Distributed Information Management, Annual Review of Information Science and Technology (ARIST), The American Society for Information Science.

    Google Scholar 

  20. G. Salton, Dynamic Information and Library Processing, Prentice Hall, Englewood Cliffs, New Jersey.

    Google Scholar 

  21. G. Salton, Automatic Text Processing, Addison-Wesley Publishing Company, Inc., Reading, MA.

    Google Scholar 

  22. H. Schütze, Automatic Word Sense Discrimination, Computational Linguistics, vol. 24, no. 1, pp. 97–124.

    Google Scholar 

  23. K. Sparck-Jones, Automatic Keyword Classification for Information Retrieval, Butterworths, London, 1971.

    Google Scholar 

  24. R. E. Tarjan, Depth first search and linear graph algorithms, SIAM J. Computing, 1:146–160.

    Google Scholar 

  25. T. Yang, Detecting Emerging Conceptual Contexts in Textual Collections, M.S. Thesis, Department of Computer Science at the University of Illinois at Urbana-Champaign, February.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Pottenger, W.M., Kim, YB., Meling, D.D. (2001). HDDI™: Hierarchical Distributed Dynamic Indexing. In: Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.R. (eds) Data Mining for Scientific and Engineering Applications. Massive Computing, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1733-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-1733-7_18

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4020-0114-7

  • Online ISBN: 978-1-4615-1733-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics