Advertisement

Czech Text Segmentation Using Voting Experts and Its Comparison with Menzerath-Altmann law

  • Tomáš Kocyan
  • Jan Martinovič
  • Jiří Dvorský
  • Václav Snášel
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 245)

Abstract

The word alphabet is connection to a lot of problems in the information retrieval. Information retrieval algorithms usually do not process the input data as sequence of bytes, but they use even bigger pieces of the data, say words or generally some chunks of the data. This is the main motivation of the paper. How to split the input data into smaller chunks without a priori known structure? To do this, we use Voting Experts Algorithms in our paper. Voting Experts Algorithm is often used to process time series data, audio signals, etc. Our intention is to use Voting Experts algorithm for future segmentation of discrete data such as DNA or proteins. For test purposes we use Czech and English text as test bed for the segmentation algorithm. We use Menzerath-Altmann law for comparison of the segmentation result.

Keywords

Voting Experts Text Segmentation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Altmann, G.: Prolegomena to Menzerath’s law. Glottometrika 2, 1–10 (1980)MathSciNetGoogle Scholar
  2. 2.
    Arnold, R., Bell, T.: A Corpus for the Evaluation of Lossless Compression Algorithms. In: Proc. 1997 IEEE Data Compression Conference, pp. 201–210 (1997)Google Scholar
  3. 3.
    Cheng, J., Mitzenmacher, M.: Markov Experts. In: Proceedings of the Data Compression Conference, DCC (2005)Google Scholar
  4. 4.
    Cohen, P.R., Adams, N.: An Algorithm for Segmenting Categorical Time Series Into Meaningful Episodes. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, pp. 198–207. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Cohen, P.R., Adams, N., Heeringa, B.: Voting Experts: An Unsupervised Algorithm for Segmenting Sequences. To Appear in Journal of Intelligent Data Analysis (2007)Google Scholar
  6. 6.
    Hewlett, D., Cohen, P.: Bootstrap Voting Experts. In: Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence, IJCAI (2009)Google Scholar
  7. 7.
    Ishioka, T.: Evaluation of criteria on information retrieval. Systems and Computers in Japan 35(6), 42–49 (2004)CrossRefGoogle Scholar
  8. 8.
    Miller, M., Wong, P., Stoytchev, A.: Unsupervised Segmentation of Audio Speech Using the Voting Experts Algorithm. In: Proceedings of the Second Conference on Artificial General Intelligence, AGI (2009)Google Scholar
  9. 9.
    Miller, M., Stoytchev, A.: Hierarchical Voting Experts: An Unsupervised Algorithm for Hierarchical Sequence Segmentation. In: Proceedings of the 7th IEEE International Conference on Development and Learning (ICDL) (Best Paper Award, ICDL 2008) (2008)Google Scholar
  10. 10.
    Muller, M.: Dynamic Time Warping. Information Retrieval for Music and Motion, pp. 69–84. Springer, Heidelberg (2007) ISBN 978-3-540-74047-6Google Scholar
  11. 11.
    Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Department of Computer Science, University of Glasgow (1979)Google Scholar
  12. 12.
    Swartz, B.E., Goldensohn, E.S.: Electroencephalography and Clinical Neurophysiology. Electroencephalography and Clinical Neurophysiology 106(2), 173–176 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tomáš Kocyan
    • 1
  • Jan Martinovič
    • 1
  • Jiří Dvorský
    • 1
  • Václav Snášel
    • 1
  1. 1.Faculty of Electrical Engineering and Computer ScienceVŠB - Technical University of OstravaOstravaCzech Republic

Personalised recommendations