Abstract
The explosive growth of digital repositories of information has been enabled by recent developments in communication and information technologies. The global Internet/World Wide Web exemplifies the rapid deployment of such technologies. Despite significant accomplishments in internetworking, however, scalable indexing and data-mining techniques for computational knowledge management lag behind the rapid growth of distributed collections. Hierarchical Distributed Dynamic Indexing (HDDI™) is an approach that dynamically creates a hierarchical index from distributed document collections. At each node of the hierarchical index, a knowledge base is created and subtopic regions of semantic locality in conceptual space are identified. This chapter introduces HDDI™, focusing on the model building techniques employed at each node of the hierarchy. A novel approach to information clustering based on the contextual transitivity of similarity between terms is introduced. We conclude with several example applications of HDDI™ in the textual data mining and information retrieval fields.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
V. Aho, J. E. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA.
R. Bader, M. Callahan, D. Grim, J. Krause, N. Miller and William M. Pottenger, The Role of the HDDI™ Collection Builder in Hierarchical Distributed Dynamic Indexing, Proceedings of the Textmine ‘01 Workshop, First SIAM International Conference on Data Mining, April.
R. Baeza-Yates and B. Ribeiro-Neto, Eds. Modern Information Retrieval, ACM Press, New York.
G. Blank, William M. Pottenger, G. D. Kessler, The CIMEL Project: Constructive, Collaborative, Inquiry-based Multimedia E-Learning, http://www.eecs.lehigh.edu/~cimel/~cimel.
F. D. Bouskila, The Role of Semantic Locality in Hierarchical Distributed Dynamic Indexing and Information Retrieval, M.S. Thesis, Department of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign, December (Bouskila’s thesis work was supervised by William M. Pottenger).
F. D. Bouskila and William M. Pottenger, The Role of Semantic Locality in Hierarchical Distributed Dynamic Indexing, Proceedings of the International Conference on Artificial Intelligence (IC-AI’2000), Las Vegas, NV, June.
E. Brill, A Simple Rule-based Part of Speech Tagger, Proceedings of the Third Conference on Applied Natural Language Processing, ACL.
S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, April.
H. Chen and K. J. Lynch, Automatic Construction of Networks of Concepts Characterizing Document Databases, IEEE Transactions on Systems, Man and Cybernetics, 22(5):885–902, September/October.
H. Chen, J. Martinez, T. Ng and B. R. Schatz, A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System, Journal of the American Society for Information Science, Volume 48, Number 1, January.
G. Cooke, SemanTag, gcooke@rt66.com.
L. Karttunen, Directed Replacement. Proceedings of the 34 th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, California.
Y. B. Kim, The Role of Hierarchical Models in Hierarchical Distributed Dynamic Indexing, M.S. Thesis, Department of Computer Science at the University of Illinois at Urbana-Champaign, June.
S. Lawrence and C. L. Giles, Accessibility of Information on the Web, Nature, Volume 400, pages 107–109.
National Research Council, Computing the Future: A Broader Agenda for Computer Science and Engineering, National Academy Press.
William Morton Pottenger, Theory, Techniques, and Experiments in Solving Recurrences in Computer Programs, Ph.D. thesis, Center for Supercomputing Research and Development in the Department of Computer Science at the University of Illinois at Urbana-Champaign, www.eecs.lehigh.edu/~billp/pubs/PhDThesis.ps /~billp/pubs/PhDThesis.ps, May.
William M. Pottenger, The Role of Associativity and Commutativity in the Detection and Transformation of Loop-Level Parallelism, In the Proceedings of the 12 th International Conference on Supercomputing, www.eecs.lehigh. edu/~billp/pubs/2057.ps.gz /~billp/pubs/2057.ps.gz, Melbourne, Australia, July.
William M. Pottenger, Detecting Emerging Concepts in HDDI™. Proceedings of the Computational Information Retrieval Workshop (CIR00), Raleigh, NC. October.
William M. Pottenger, M. R. Callahan, and M. D. Padgett, Distributed Information Management, Annual Review of Information Science and Technology (ARIST), The American Society for Information Science.
G. Salton, Dynamic Information and Library Processing, Prentice Hall, Englewood Cliffs, New Jersey.
G. Salton, Automatic Text Processing, Addison-Wesley Publishing Company, Inc., Reading, MA.
H. Schütze, Automatic Word Sense Discrimination, Computational Linguistics, vol. 24, no. 1, pp. 97–124.
K. Sparck-Jones, Automatic Keyword Classification for Information Retrieval, Butterworths, London, 1971.
R. E. Tarjan, Depth first search and linear graph algorithms, SIAM J. Computing, 1:146–160.
T. Yang, Detecting Emerging Conceptual Contexts in Textual Collections, M.S. Thesis, Department of Computer Science at the University of Illinois at Urbana-Champaign, February.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Pottenger, W.M., Kim, YB., Meling, D.D. (2001). HDDI™: Hierarchical Distributed Dynamic Indexing. In: Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.R. (eds) Data Mining for Scientific and Engineering Applications. Massive Computing, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1733-7_18
Download citation
DOI: https://doi.org/10.1007/978-1-4615-1733-7_18
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-0114-7
Online ISBN: 978-1-4615-1733-7
eBook Packages: Springer Book Archive