Abstract
In this chapter, a new empirical approach, named autonomous data partitioning, is proposed to partition the data autonomously by creating a Voronoi tessellation around the objectively identified prototypes to form data clouds, which transform the large amount of raw data into a much smaller (manageable) number of more representative aggregations with semantic meaning. The proposed empirical algorithm has two forms/types, namely, the offline version and the evolving version. The offline version is based on the ranks of the observations in terms of their multimodal typicality values and local ensemble properties. The evolving version is for streaming data processing and works with the data density. It is able to start “from scratch”, but can create a hybrid with the offline version as well. Moreover, an algorithm is proposed to guarantee the local optimality of the autonomous data partitioning approach allowing the proposed approach to end up with a locally optimal structure of data clouds represented by their focal points/prototypes, which is then ready to be used for analysis, building a multi-model classifier, predictor, controller or for fault isolation.
Keywords
- Multimodal Typicality
- Cloud Data
- Data Partitioning Approach
- Voronoi Tessellation
- Ensemble Properties
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options










References
G.A. Brosamler, An almost everywhere central limit theorem. Math. Proc. Cambridge Philos. Soc. 104(3), 561–574 (1988)
S.Y. Shatskikha, Multivariate Cauchy distributions as locally Gaussian distributions. J. Math. Sci. 78(1), 102–108 (1996)
C. Lee, Fast simulated annealing with a multivariate Cauchy distribution and the configuration’s initial temperature. J. Korean Phys. Soc. 66(10), 1457–1466 (2015)
S. Nadarajah, S. Kotz, Probability integrals of the multivariate t distribution. Can. Appl. Math. Q. 13(1), 53–84 (2005)
A. Corduneanu, C.M. Bishop, in Variational Bayesian Model Selection for Mixture Distributions, Proceedings of Eighth International Conference on Artificial Intelligent Statistics (2001), pp. 27–34
E. Tu, L. Cao, J. Yang, N. Kasabov, A novel graph-based k-means for nonlinear manifold clustering and representative selection. Neurocomputing 143, 109–122 (2014)
P. Angelov, Autonomous Learning Systems: From Data Streams to Knowledge in Real Time (Wiley, New York, 2012)
M. Aitkin, D.B. Rubin, Estimation and hypothesis testing in finite mixture models. J. R. Stat. Soc. Ser. B (Methodol.) 47(1), 67–75 (1985)
C.E. Lawrence, A.A. Reilly, An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins Struct. Funct. Bioinforma. 7(1), 41–51 (1990)
J.A. Bilmes, A gentle tutorial of the EM algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Int. Comput. Sci. Inst. 4(510), 126 (1998)
D.A. Reynolds, T.F. Quatieri, R.B. Dunn, Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1), 19–41 (2000)
C.E. Rasmussen, The infinite Gaussian mixture model. Adv. Neural. Inf. Process. Syst. 12(11), 554–560 (2000)
A. Gionis, H. Mannila, P. Tsaparas, Clustering aggregation. ACM Trans. Knowl. Discov. Data 1(1), 1–30 (2007)
P. Angelov, X. Gu, D. Kangin, Empirical data analytics. Int. J. Intell. Syst. 32(12), 1261–1284 (2017)
P.P. Angelov, X. Gu, J. Principe, D. Kangin, in Empirical Data Analysis—A New Tool for Data Analytics, IEEE International Conference on Systems, Man, and Cybernetics (2016), pp. 53–59
P. Angelov, Fuzzily connected multimodel systems evolving autonomously from data streams. IEEE Trans. Syst. Man, Cybern. Part B Cybern. 41(4), 898–910 (2011)
P. Angelov, R. Yager, Density-based averaging—a new operator for data fusion. Inf. Sci. (Ny) 222, 163–174 (2013)
P. Angelov, R. Yager, A new type of simplified fuzzy rule-based system. Int. J. Gen Syst. 41(2), 163–185 (2011)
A. Okabe, B. Boots, K. Sugihara, S.N. Chiu, Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, 2nd edn. (Wiley, Chichester, 1999)
L.A. Zadeh, Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Syst. Man Cybern. 1, 28–44 (1973)
E.H. Mamdani, S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. Man Mach. Stud. 7(1), 1–13 (1975)
T. Takagi, M. Sugeno, Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man. Cybern. 15(1), 116–132 (1985)
P.P. Angelov, X. Gu, J.C. Principe, Autonomous learning multi-model systems from data streams. IEEE Trans. Fuzzy Syst. 26(4), 2213–2224 (2018)
X. Gu, P.P. Angelov, J.C. Principe, A method for autonomous data partitioning. Inf. Sci. (Ny) 460–461, 65–82 (2018)
W. Pedrycz, Granular Computing: Analysis and Design of Intelligent Systems (CRC Press, Boca Raton, 2013)
P.P. Angelov, D.P. Filev, An approach to online identification of Takagi-Sugeno fuzzy models. IEEE Trans. Syst. Man Cybern. Part B Cybern. 34(1), 484–498 (2004)
P. Angelov, An approach for fuzzy rule-base adaptation using on-line clustering. Int. J. Approx. Reason. 35(3), 275–289 (2004)
P.P. Angelov, D.P. Filev, N.K. Kasabov, Evolving Intelligent Systems: Methodology and Applications (2010)
P. Angelov, D. Filev, in On-line Design of Takagi-Sugeno Models, in International Fuzzy Systems Association World Congress (Springer, Berlin, 2003), pp. 576–584
X. Gu, P.P. Angelov, Self-organising fuzzy logic classifier. Inf. Sci. (Ny) 447, 36–51 (2018)
S.Z. Selim, M.A. Ismail, K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6(1), 81–87 (1984)
H.W. Kuhn, A Tucker, in Nonlinear Programming, Proceedings of the Second Symposium on Mathematical Statistics and Probability (1951), pp. 481–492
R.E. Wendell, A.P. Hurter Jr., Minimization of a non-separable objective function subject to disjoint constraints. Oper. Res. 24(4), 643–657 (1976)
J.B. MacQueen, Some methods for classification and analysis of multivariate observations. 5th Berkeley Symp. Math. Stat. Probab. 1(233), 281–297 (1967)
X. Gu, P. Angelov, D. Kangin, J. Principe, Self-organised direction aware data partitioning algorithm. Inf. Sci. (Ny) 423, 80–95 (2018)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Angelov, P.P., Gu, X. (2019). Data Partitioning—Empirical Approach. In: Empirical Approach to Machine Learning. Studies in Computational Intelligence, vol 800. Springer, Cham. https://doi.org/10.1007/978-3-030-02384-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-02384-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02383-6
Online ISBN: 978-3-030-02384-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)