Concept Features Extraction and Text Clustering Analysis of Neural Networks Based on Cognitive Mechanism
The feature selection is an important part in automatic classification. In this paper, we use the HowNet to extract the concept attributes, and propose CHI-MCOR method to build a feature set. This method not only selects the highly occurring words, but also selects the word whose occurrence frequency is middle or low occurring words that are important for text classification. The combined method is much better than any one of the weight methods. Then we use the Self-Organizing Map (SOM) to realize automatic text clustering. The experiment result shows that if we can extract the sememes properly, we can not only reduce the feature dimension but also improve the classification precise. SOM can be used in text clustering in large scales and the clustering results are good when the concept feature is selected.
KeywordsNatural Language Processing Text Classification Chinese Word Concept Attribute Concept Feature
Unable to display preview. Download preview PDF.