Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 190))

Abstract

Sky surveys for Astronomy are expected to generate 2.5 petabytes a year. Electronic medical records hold the promise of treatment comparisons, grouping patients by outcomes but will be contained in petabyte data storage. We can store lots of data and much of it wont have labels. How can we analyze or explore the data? Unsupervised clustering, fuzzy, possibilistic or probabilistic will allow us to group data. However, the algorithms scale poorly in terms of computation time as the data gets large and are impractical without modification when the data exceeds the size of memory. We will explore distributed clustering, stream data clustering and subsampling approaches to enable scalable clustering. Examples will show that one can scale to build good models of the data without necessarily seeing all the data and, if needed, modified algorithms can be applied to terabytes and more of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York City (1981)

    Book  MATH  Google Scholar 

  2. Gu, Y., Hall, L.O., Goldgof, D.B.: Evaluating scalable fuzzy clustering. In: Proc. 2010 IEEE Int. Conf. on Systems Man and Cybernetics (SMC), Istanbul, Turkey, October 10-13, pp. 873–880. IEEE Press (2010)

    Google Scholar 

  3. Hall, L., Goldgof, D.: Convergence of the single-pass and online fuzzy c-means algorithms. IEEE Trans. Fuzzy Syst. 19(4), 792–794 (2011)

    Article  Google Scholar 

  4. Hathaway, R.J., Bezdek, J.C., Tucker, W.T.: An improved convergence theory for the fuzzy c-means clustering algorithms. In: Bezdek, J.C. (ed.) Analysis of Fuzzy Information: Applications in Engineering and Science, vol. 3, pp. 123–131. CRC Press, Boca Raton (1987)

    Google Scholar 

  5. Hore, P., Hall, L., Goldgof, D., Cheng, W.: Online fuzzy c means. In: Ann. Meeting of the North American Fuzzy Information Processing Society (NAFIPS 2008), pp. 1–5 (2008)

    Google Scholar 

  6. Hore, P., Hall, L.O., Goldgof, D.B., Gu, Y., Maudsley, A.A., Darkazanli, A.: A scalable framework for segmenting magnetic resonance images. J. Sign. Process. Syst. 54, 183–203 (2009)

    Article  Google Scholar 

  7. Hung, M.C., Yang, D.L.: An efficient fuzzy c-means clustering algorithm. In: Proc. 2001 IEEE Int. Conf. on Data Mining (ICDM 2001), pp. 225–232. IEEE Press (2001)

    Google Scholar 

  8. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  9. Pal, N.R., Bezdek, J.C.: Complexity reduction for “large image” processing. IEEE Trans. Syst. Man Cybern. 32(5), 598–611 (2002)

    Article  Google Scholar 

  10. Parker, J.K., Hall, L.O., Bezdek, J.C.: Comparison of scalable fuzzy clustering methods. In: Proc. IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE 2012), Brisbane, Australia, June 10-15, pp. 359–367. IEEE Press (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lawrence O. Hall .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hall, L.O. (2013). Exploring Big Data with Scalable Soft Clustering. In: Kruse, R., Berthold, M., Moewes, C., Gil, M., Grzegorzewski, P., Hryniewicz, O. (eds) Synergies of Soft Computing and Statistics for Intelligent Data Analysis. Advances in Intelligent Systems and Computing, vol 190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33042-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33042-1_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33041-4

  • Online ISBN: 978-3-642-33042-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics