Skip to main content

Data Partitioning—Empirical Approach

Part of the Studies in Computational Intelligence book series (SCI,volume 800)

Abstract

In this chapter, a new empirical approach, named autonomous data partitioning, is proposed to partition the data autonomously by creating a Voronoi tessellation around the objectively identified prototypes to form data clouds, which transform the large amount of raw data into a much smaller (manageable) number of more representative aggregations with semantic meaning. The proposed empirical algorithm has two forms/types, namely, the offline version and the evolving version. The offline version is based on the ranks of the observations in terms of their multimodal typicality values and local ensemble properties. The evolving version is for streaming data processing and works with the data density. It is able to start “from scratch”, but can create a hybrid with the offline version as well. Moreover, an algorithm is proposed to guarantee the local optimality of the autonomous data partitioning approach allowing the proposed approach to end up with a locally optimal structure of data clouds represented by their focal points/prototypes, which is then ready to be used for analysis, building a multi-model classifier, predictor, controller or for fault isolation.

Keywords

  • Multimodal Typicality
  • Cloud Data
  • Data Partitioning Approach
  • Voronoi Tessellation
  • Ensemble Properties

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-02384-3_7
  • Chapter length: 24 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   149.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-02384-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   199.99
Price excludes VAT (USA)
Hardcover Book
USD   199.99
Price excludes VAT (USA)
Fig. 7.1
Fig. 7.2
Fig. 7.3
Fig. 7.4
Fig. 7.5
Fig. 7.6
Fig. 7.7
Fig. 7.8
Fig. 7.9
Fig. 7.10

References

  1. G.A. Brosamler, An almost everywhere central limit theorem. Math. Proc. Cambridge Philos. Soc. 104(3), 561–574 (1988)

    CrossRef  MathSciNet  Google Scholar 

  2. http://www.worldweatheronline.com

  3. S.Y. Shatskikha, Multivariate Cauchy distributions as locally Gaussian distributions. J. Math. Sci. 78(1), 102–108 (1996)

    CrossRef  MathSciNet  Google Scholar 

  4. C. Lee, Fast simulated annealing with a multivariate Cauchy distribution and the configuration’s initial temperature. J. Korean Phys. Soc. 66(10), 1457–1466 (2015)

    CrossRef  Google Scholar 

  5. S. Nadarajah, S. Kotz, Probability integrals of the multivariate t distribution. Can. Appl. Math. Q. 13(1), 53–84 (2005)

    MathSciNet  MATH  Google Scholar 

  6. A. Corduneanu, C.M. Bishop, in Variational Bayesian Model Selection for Mixture Distributions, Proceedings of Eighth International Conference on Artificial Intelligent Statistics (2001), pp. 27–34

    Google Scholar 

  7. E. Tu, L. Cao, J. Yang, N. Kasabov, A novel graph-based k-means for nonlinear manifold clustering and representative selection. Neurocomputing 143, 109–122 (2014)

    CrossRef  Google Scholar 

  8. P. Angelov, Autonomous Learning Systems: From Data Streams to Knowledge in Real Time (Wiley, New York, 2012)

    CrossRef  Google Scholar 

  9. M. Aitkin, D.B. Rubin, Estimation and hypothesis testing in finite mixture models. J. R. Stat. Soc. Ser. B (Methodol.) 47(1), 67–75 (1985)

    MATH  Google Scholar 

  10. C.E. Lawrence, A.A. Reilly, An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins Struct. Funct. Bioinforma. 7(1), 41–51 (1990)

    CrossRef  Google Scholar 

  11. J.A. Bilmes, A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Int. Comput. Sci. Inst. 4(510), 126 (1998)

    Google Scholar 

  12. D.A. Reynolds, T.F. Quatieri, R.B. Dunn, Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1), 19–41 (2000)

    CrossRef  Google Scholar 

  13. C.E. Rasmussen, The infinite Gaussian mixture model. Adv. Neural. Inf. Process. Syst. 12(11), 554–560 (2000)

    Google Scholar 

  14. A. Gionis, H. Mannila, P. Tsaparas, Clustering aggregation. ACM Trans. Knowl. Discov. Data 1(1), 1–30 (2007)

    CrossRef  Google Scholar 

  15. http://cs.joensuu.fi/sipu/datasets/

  16. P. Angelov, X. Gu, D. Kangin, Empirical data analytics. Int. J. Intell. Syst. 32(12), 1261–1284 (2017)

    CrossRef  Google Scholar 

  17. P.P. Angelov, X. Gu, J. Principe, D. Kangin, in Empirical Data Analysis—A New Tool for Data Analytics, IEEE International Conference on Systems, Man, and Cybernetics (2016), pp. 53–59

    Google Scholar 

  18. P. Angelov, Fuzzily connected multimodel systems evolving autonomously from data streams. IEEE Trans. Syst. Man, Cybern. Part B Cybern. 41(4), 898–910 (2011)

    CrossRef  Google Scholar 

  19. P. Angelov, R. Yager, Density-based averaging—a new operator for data fusion. Inf. Sci. (Ny) 222, 163–174 (2013)

    CrossRef  MathSciNet  Google Scholar 

  20. P. Angelov, R. Yager, A new type of simplified fuzzy rule-based system. Int. J. Gen Syst. 41(2), 163–185 (2011)

    CrossRef  MathSciNet  Google Scholar 

  21. A. Okabe, B. Boots, K. Sugihara, S.N. Chiu, Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, 2nd edn. (Wiley, Chichester, 1999)

    MATH  Google Scholar 

  22. L.A. Zadeh, Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Syst. Man Cybern. 1, 28–44 (1973)

    CrossRef  MathSciNet  Google Scholar 

  23. E.H. Mamdani, S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. Man Mach. Stud. 7(1), 1–13 (1975)

    CrossRef  Google Scholar 

  24. T. Takagi, M. Sugeno, Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man. Cybern. 15(1), 116–132 (1985)

    CrossRef  Google Scholar 

  25. P.P. Angelov, X. Gu, J.C. Principe, Autonomous learning multi-model systems from data streams. IEEE Trans. Fuzzy Syst. 26(4), 2213–2224 (2018)

    CrossRef  Google Scholar 

  26. X. Gu, P.P. Angelov, J.C. Principe, A method for autonomous data partitioning. Inf. Sci. (Ny) 460–461, 65–82 (2018)

    CrossRef  Google Scholar 

  27. W. Pedrycz, Granular Computing: Analysis and Design of Intelligent Systems (CRC Press, Boca Raton, 2013)

    CrossRef  Google Scholar 

  28. P.P. Angelov, D.P. Filev, An approach to online identification of Takagi-Sugeno fuzzy models. IEEE Trans. Syst. Man Cybern. Part B Cybern. 34(1), 484–498 (2004)

    CrossRef  Google Scholar 

  29. P. Angelov, An approach for fuzzy rule-base adaptation using on-line clustering. Int. J. Approx. Reason. 35(3), 275–289 (2004)

    CrossRef  MathSciNet  Google Scholar 

  30. P.P. Angelov, D.P. Filev, N.K. Kasabov, Evolving Intelligent Systems: Methodology and Applications (2010)

    Google Scholar 

  31. P. Angelov, D. Filev, in On-line Design of Takagi-Sugeno Models, in International Fuzzy Systems Association World Congress (Springer, Berlin, 2003), pp. 576–584

    Google Scholar 

  32. X. Gu, P.P. Angelov, Self-organising fuzzy logic classifier. Inf. Sci. (Ny) 447, 36–51 (2018)

    CrossRef  Google Scholar 

  33. S.Z. Selim, M.A. Ismail, K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6(1), 81–87 (1984)

    CrossRef  Google Scholar 

  34. H.W. Kuhn, A Tucker, in Nonlinear Programming, Proceedings of the Second Symposium on Mathematical Statistics and Probability (1951), pp. 481–492

    Google Scholar 

  35. R.E. Wendell, A.P. Hurter Jr., Minimization of a non-separable objective function subject to disjoint constraints. Oper. Res. 24(4), 643–657 (1976)

    CrossRef  MathSciNet  Google Scholar 

  36. J.B. MacQueen, Some methods for classification and analysis of multivariate observations. 5th Berkeley Symp. Math. Stat. Probab. 1(233), 281–297 (1967)

    MathSciNet  MATH  Google Scholar 

  37. X. Gu, P. Angelov, D. Kangin, J. Principe, Self-organised direction aware data partitioning algorithm. Inf. Sci. (Ny) 423, 80–95 (2018)

    CrossRef  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Plamen P. Angelov .

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Angelov, P.P., Gu, X. (2019). Data Partitioning—Empirical Approach. In: Empirical Approach to Machine Learning. Studies in Computational Intelligence, vol 800. Springer, Cham. https://doi.org/10.1007/978-3-030-02384-3_7

Download citation