Abstract
This paper proposes two new character-based full-text indexing models, i.e., adjacency matrix based inverted file and adjacency matrix based PAT array. Formally, the former is a kind of reorganization of the traditional inverted file, and the latter is a kind of decomposition of the traditional PAT array. Both organize text-indexing information in the form of adjacency matrix. Query algorithms for the new models are developed and performance comparisons between the new models and the traditional models are carried out. The new models can improve query-processing efficiency considerably at the cost of much less amount of extra storage overhead compared to the size of original text database, so are suitable for applications of large-scale text databases, especially Chinese text databases.
This work was supported by China Postdoctoral Science Foundation and National 863 Hi-Tech Foundation (No. 863-306-ZT04-02-2).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Baesa-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, Reading, Mass. 1999.
W. B. Frakes and R Baesa-Yates. Information Retrieval: Data Structures & Algorithms. Prentice Hall PTR, Upper Saddle River, New Jersey. 1992.
D. Sullivan. Search Engine Watch, http://www.searchenginewatch.com.
AltaVista, http://www.altavista.com.
A. Tomasic, H. Garcia-Molina and K. Shoens. Incremental updates of inverted lists for text document retrieval. In: Proceedings of SIGMOD’94, 1994. 289–300.
C. Faltousos and S. Christodoulakis. Signature files: an access method for documents and its analytical performance evaluation. ACM Trans. On Office Information Systems, 1984, 2(4): 267–88.
D. R. Morrison. PATRICIA-practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM, 1968, 15(4): 514–534.
G. Navarro. An optimal index for PAT arrays. In: N. Ziviani, R. Baeza-Yates and G. Guimaraes, editors. Proceedings of the Third South American Workshop on String Processing. Carleton University Press International Informatics Series, V.4, Recife, Braizl, 1996. 214–227.
C. Tenopir and J. S. Ro. Full Text Database. Greenwood Press, 1990.
S. Zhou. Key techniques of Chinese text databases. PhD thesis, Department of Computer Science, Fudan University, China, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhou, S., Guan, J., Hu, Y., Hu, J., Zhou, A. (2001). Adjacency Matrix Based Full-Text Indexing Models. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_6
Download citation
DOI: https://doi.org/10.1007/3-540-47714-4_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42298-3
Online ISBN: 978-3-540-47714-3
eBook Packages: Springer Book Archive