The Efficient Computation of Complete and Concise Substring Scales with Suffix Trees
Strings are an important part of most real application multi-valued contexts. Their conceptual treatment requires the definition of substring scales, i.e., sets of relevant substrings, so as to form informative concepts. However these scales are either defined by hand, or derived in a context-unaware manner (e.g., all words occuring in string values). We present an efficient algorithm based on suffix trees that produces complete and concise substring scales. Completeness ensures that every possible concept is formed, like when considering the scale of all substrings. Conciseness ensures the number of scale attributes (substrings) is less than the cumulated size of all string values. This algorithm is integrated in Camelis, and illustrated on the set of all ICCS paper titles.
Unable to display preview. Download preview PDF.
- [Kuz99]Kuznetsov, S.: Learning of simple conceptual graphs from positive and negative examples. In: Żytkow, J.M., Rauch, J. (eds.) Principles of Data Mining and Knowledge Discovery. LNCS (LNAI), vol. 1704, pp. 384–391. Springer, Heidelberg (1999)Google Scholar