Skip to main content

DOTS: Drift Oriented Tool System

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9492))

Included in the following conference series:

Abstract

Drift is a given in most machine learning applications. The idea that models must accommodate for changes, and thus be dynamic, is ubiquitous. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. There are multiple drift patterns types: concepts that appear and disappear suddenly, recurrently, or even gradually or incrementally. Researchers strive to propose and test algorithms and techniques to deal with drift in text classification, but it is difficult to find adequate benchmarks in such dynamic environments.

In this paper we present DOTS, Drift Oriented Tool System, a framework that allows for the definition and generation of text-based datasets where drift characteristics can be thoroughly defined, implemented and tested. The usefulness of DOTS is presented using a Twitter stream case study. DOTS is used to define datasets and test the effectiveness of using different document representation in a Twitter scenario. Results show the potential of DOTS in machine learning research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.lemurproject.org/

References

  1. Wu, Q., Hu, W., Wang, B., Han, Z., Qi, Y.: Software aging mechanism analysis and rejuvenation. Int. J. Digit. Content Technol. Appl. 6(22), 552 (2012)

    Article  Google Scholar 

  2. Costa, J., Silva, C., Antunes, M., Ribeiro, B.: Concept drift awareness in Twitter streams. In: Proceedings of 13th International Conference on Machine Learning and Applications, pp. 294–299 (2014)

    Google Scholar 

  3. Mejri, D., Khanchel, R., Limam, M.: An ensemble method for concept drift in nonstationary environment. J. Stat. Comput. Simul. 83(6), 1115–1128 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  4. Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)

    Article  Google Scholar 

  5. Tsymbal, A., Pechenizkiy, M., Cunningham, P., Puuronen, S.: Dynamic integration of classifiers for handling concept drift. Inf. Fusion 9(1), 56–68 (2008)

    Article  Google Scholar 

  6. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)

    Google Scholar 

  7. Zliobaite, I.: Learning under concept drift: an overview. Vilnius University, Faculty of Mathematics and Informatic, Technical report (2010)

    Google Scholar 

  8. Willett, P.: The porter stemming algorithm: then and now. Program 40(3), 219–223 (2006)

    Article  MathSciNet  Google Scholar 

  9. Krovetz, R.: Viewing morphology as an inference process. In: Proceedings of 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202. ACM (1993)

    Google Scholar 

  10. Huang, J., Thornton, K.M., Efthimiadis, E.N.: Conversational tagging in Twitter. In: Proceedings of 21st ACM Conference on Hypertext and Hypermedia, pp. 173–178 (2010)

    Google Scholar 

  11. Merriam-webster’s dictionary, October 2012

    Google Scholar 

  12. Zappavigna, M.: Ambient affiliation: a linguistic perspective on Twitter. New Media Soc. 13(5), 788–806 (2011)

    Article  Google Scholar 

  13. Johnson, S.: How Twitter will change the way we live. Time Mag. 173, 23–32 (2009)

    Google Scholar 

  14. Tsur, O., Rappoport, A.: What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of 5th International Conference on Web Search and Data Mining, pp. 643–652 (2012)

    Google Scholar 

  15. Yang, L., Sun, T., Zhang, M., Mei, Q.: We know what @you #tag: does the dual role affect hashtag adoption? In: Proceedings of 21st International Conference on World Wide Web, pp. 261–270 (2012)

    Google Scholar 

  16. Chang, H.-C.: A new perspective on Twitter hashtag use: diffusion of innovation theory. In: Proceedings of 73rd Annual Meeting on Navigating Streams in an Information Ecosystem, pp. 85:1–85:4 (2010)

    Google Scholar 

  17. Costa, J., Silva, C., Antunes, M., Ribeiro, B.: Defining semantic meta-hashtags for twitter classification. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 226–235. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Acknowledgment

We gratefully acknowledge iCIS project (CENTRO-07-ST24-FEDER - 107002003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joana Costa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Costa, J., Silva, C., Antunes, M., Ribeiro, B. (2015). DOTS: Drift Oriented Tool System. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9492. Springer, Cham. https://doi.org/10.1007/978-3-319-26561-2_72

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26561-2_72

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26560-5

  • Online ISBN: 978-3-319-26561-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics