Text Clustering

Li, Hua

doi:10.1007/978-1-4614-8265-9_415

Hua Li³

21 Accesses

Definition

Text clustering is to automatically group textual documents (for example, documents in plain text, web pages, emails and etc) into clusters based on their content similarity. The problem of text clustering can be defined as follows. Given a set of n documents noted as DS and a pre-defined cluster number K (usually set by users), DS is clustered into K document clusters DS₁ , DS₂ , … , DS_k, (i . e , {DS₁, DS₂, … , DS_k} = DS) so that the documents in a same document cluster are similar to one another while documents from different clusters are dissimilar [14].

Historical Background

Text clustering was initially developed to improve the performance of search engines through pre-clustering the entire corpus [2]. Text clustering later has also been investigated as a post-retrieval document browsing technique [1, 2, 7].

Foundations

Text clustering consists of several important components including document representation, text clustering algorithms and performance measurements....

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Microsoft Research Asia, Beijing, China
Hua Li

Authors

Hua Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Li .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

Microsoft Research Asia, Microsoft Corporation, Beijing, Haidian, China
Zheng Chen

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Li, H. (2018). Text Clustering. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_415

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_415
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Text Clustering

Definition

Historical Background

Foundations

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Text Clustering

Definition

Historical Background

Foundations

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation