Efficient System for Clustering of Dynamic Document Database
We describe in this paper, a system that groups, classifies and finds the latent semantic features in a database composed of a large number of documents. The database will be constantly growing as users who co-create it will be adding more and more new documents. Users require a system to provide them information, both about a specific document, and about the entire set of documents. This information includes statistical data about words in documents, information about aspects in which this words appears, classification, clustering, etc.
To meet these expectations we propose using methods for searching for hidden patterns in multivariable data. We apply machine learning algorithms for data analysis, useful in identifying local patterns in multivariate data. We consider two different algorithms described in the literature (1) Probabilistic Latent Semantic Analysis Method  and (2) Nonnegative Matrix Factorization algorithm described in  and used in the text analysis system .
Keywordsclustering classification NMF semantic features document database
Unable to display preview. Download preview PDF.