Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2015: Machine Learning and Knowledge Discovery in Databases pp 20-36

Clustering by Intent: A Semi-Supervised Method to Discover Relevant Clusters Incrementally

Conference paper

DOI: 10.1007/978-3-319-23461-8_2

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9286)
Cite this paper as:
Forman G., Nachlieli H., Keshet R. (2015) Clustering by Intent: A Semi-Supervised Method to Discover Relevant Clusters Incrementally. In: Bifet A. et al. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science, vol 9286. Springer, Cham

Abstract

Our business users have often been frustrated with clustering results that do not suit their purpose; when trying to discover clusters of product complaints, the algorithm may return clusters of product models instead. The fundamental issue is that complex text data can be clustered in many different ways, and, really, it is optimistic to expect relevant clusters from an unsupervised process, even with parameter tinkering.

We studied this problem in an interactive context and developed an effective solution that re-casts the problem formulation, radically different from traditional or semi-supervised clustering. Given training labels of some known classes, our method incrementally proposes complementary clusters. In tests on various business datasets, we consistently get relevant results and at interactive time scales. This paper describes the method and demonstrates its superior ability using publicly available datasets. For automated evaluation, we devised a unique cluster evaluation framework to match the business user’s utility.

Keywords

Semi-supervised clustering Class discovery Topic detection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Hewlett-Packard LabsPalo AltoUSA
  2. 2.Hewlett-Packard LabsHaifaIsrael

Personalised recommendations