Learning to Summarize Time Series Data

  • Pranay Kumar Venkata Sowdaboina
  • Sutanu Chakraborti
  • Somayajulu Sripada
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8403)

Abstract

In this paper we focus on content selection for summarizing time series data using Machine Learning techniques. The goal is to exploit a parallel corpus to predict the appropriate level of abstraction required for a summarization task. This is an important step towards building an automated NLG (Natural Language Generation) system to generate text for unseen data. Machine learning approaches are used to induce the underlying rules for text summarization, which are potentially close to the ones that humans use to generate textual summaries. We present an approach to select important points in a time series that can aid in generating captions or textual summaries. We evaluate our techniques on a parallel corpus of human generated weather forecast text corresponding to numerical weather prediction data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Belz, A, 2005]
    Belz, A.: Corpus-driven generation of weather forecasts. In: Proceedings of the 3rd Corpus Linguistics Conference (CL 2005) (2005)Google Scholar
  2. [Colin Kelly et al, 2009]
    Kelly, C., Copestake, A., Karamanis, N.: Investigating content selection for language generation using machine learning. In: Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009), Athens, pp. 130–137 (2009)Google Scholar
  3. [Duboue et al, 2003]
    Duboue, P.A., McKeown, K.R.: Statistical Acquisition of Content Selection Rules for Natural Language Generation (EMNLP 2003), pp. 121–128 (2003)Google Scholar
  4. [Goldberg et al, 1994]
    Goldberg, E., Driedger, N.: Using natural-language processing to produce weather forecasts. In: Proceedings of the IEEE Expert (1994)Google Scholar
  5. [Lavrenko et al., 2000]
    Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., Allan, J.: Mining of concurrent text and time series. In: Proceedings of the 6 th ACM SIGKDD Intl Conference on Knowledge Discovery and Data Mining Workshop on Text Mining (2000)Google Scholar
  6. [Mark Hall et al, 2009]
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)Google Scholar
  7. [Reiter et al, 1997]
    Reiter, E., Dale, R.: Building applied natural language generation systems. Natural Langauge Engineering 3(1), 57–87 (1997)CrossRefGoogle Scholar
  8. [Reiter et al, 2003a]
    Reiter, E., Sripada, S., Robertson, R.: Acquiring correct knowledge for natural language generation. Journal of Artificial Intelligence Research 18, 491–516 (2003a)MATHGoogle Scholar
  9. [Reiter et al., 2003b]
    Ehud Reiter, R.R., Osman, L.M.: Generating tailored smoking cessation letters. In: Artificial Intelligence (2003b)Google Scholar
  10. [Reiter, 2003c]
    Reiter, E.: Learning the meaning and usage of time phrases from a parallel text-data corpus. In: Proceedings of the HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-Linguistic Data (2003c)Google Scholar
  11. [Sripada et al., 2002]
    Sripada, S., Reiter, E., Hunter, J., Yu, J.: Segmenting time series for weather forecasting. In: Applications and Innovations in Intelligent Systems X. Springer (2002)Google Scholar
  12. [Sripada et al, 2003]
    Sripada, S.G., Reiter, E., Davy, I.: SUMTIME-MOUSAM: Configurable Marine Weather Forecast Generator (2003)Google Scholar
  13. [Sripada et al., 2001a]
    Somayajulu, S.G., Reiter, E., Hunter, J., Yu, J.: Segmenting time series for weather forecasting. University of Aberdeen, U.K. (2001a)Google Scholar
  14. [Sripada et al, 2001b]
    Somayajulu, S.G., Reiter, E., Hunter, J., Yu, J.: Modelling the task of Summarising Time Series Data using KA Techniques. University of Aberdeen, U.K. (2001b)Google Scholar
  15. [Sripada et al, 2003]
    Sripada, S.G., Reiter, E., Hunter, J., Yu, J.: Exploiting a parallel text-data corpus. In: Proceedings of Corpus Linguistics (2003)Google Scholar
  16. [Vasko et al, 2002]
    Vasko, K.T., Toivonen, H.T.: Estimating the number of segments in time series data using permutation tests. In: IEEE International Conference on Data Mining (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Pranay Kumar Venkata Sowdaboina
    • 1
  • Sutanu Chakraborti
    • 1
  • Somayajulu Sripada
    • 2
  1. 1.Department of Computer ScienceIndian Institute Technology MadrasIndia
  2. 2.Computing ScienceUniversity of AbeerdeenUK

Personalised recommendations