Incremental Hierarchical Clustering of Stochastic Pattern-Based Symbolic Data

  • Xin XuEmail author
  • Jiaheng Lu
  • Wei Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9652)


Classic data analysis techniques generally assume that variables have single values only. However, the data complexity during the age of big data has gone beyond the classic framework such that variable values probably take the form of a set of stochastic measurements instead. We refer to the above case as the stochastic pattern-based symbolic data where each measurement set is an instance of an underlying stochastic pattern. In such a case, non existing classic data analysis approaches, such as the crystal item or fuzzy region ones, could apply yet. For this reason, we put forward a novel Incremental Hierarchical Clustering algorithm for stochastic Pattern-based Symbolic Data (IHCPSD). IHCPSD is robust to overlapping and missing measurements and well adapted for incremental learning. Experiments on synthetic and application on real-life emitter parameter data have validated its effectiveness.


Symbolic data analysis Stochastic pattern Incremental learning Hierarchical clustering Emitter parameter analysis 


  1. 1.
    Diday, E.: Introduction à lapproche symbolique en analyse des données. RAIRO Rech. Opérationnelle 23(2), 193–236 (1989)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Bock, H.-H., Diday, E. (eds.): Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, Heidelberg (2000)zbMATHGoogle Scholar
  3. 3.
    Noirhomme-Fraiture, M., Brito, P.: Far beyond the classical data models: symbolic data analysis. Stat. Anal. Data Min. ASA Data Sci. J. 4(2), 157–170 (2011)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Billard, L.: Sample covariance functions for complex quantitative data. In: Proceedings of the IASC, Joint Meeting of 4th World Conference of the IASC and 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis, Yokohama, Japan (2008)Google Scholar
  5. 5.
    Lauro, C., Verde, R., Irpino, A.: Generalized canonical analysis. In: Diday, E., Noirhomme-Fraiture, M. (eds.) Symbolic Data Analysis and the Sodas Software, pp. 313–330. Wiley, Chichester (2008)Google Scholar
  6. 6.
    De Carvalho, F.A.T., de Souza, R.: Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recogn. Lett. 31(5), 430–443 (2010)CrossRefGoogle Scholar
  7. 7.
    Rasson, J.P., Pircon, J.-Y., Lallemand, P., Adans, S.: Unsupervised divisive classification. In: Diday, E., Noirhomme-Fraiture, M. (eds.) Symbolic Data Analysis and the Sodas Software, pp. 149–156. Wiley, Chichester (2008)Google Scholar
  8. 8.
    Neto, E.A.L., De Carvalho, F.A.T.: Constrained linear regression models for symbolic interval-valued variables. Comput. Stat. Data Anal. 54(2), 333–347 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Arroyo, J., González-Rivera, G., Maté, C.: Forecasting with interval and histogram data. Some financial applications. In: Ullah, A., Giles, D., Balakrishnan, N., Schucany, W., Schilling, E. (eds.) Handbook of Empirical Economics and Finance. Chapman and Hall/CRC, New York (2010)Google Scholar
  10. 10.
    González-Rivera, G., Arroyo, J.: Time series modeling of histogram-valued data: the daily histogram time series of SP&500 intradaily returns. Int. J. Forecast. 28(1), 20–33 (2012)CrossRefGoogle Scholar
  11. 11.
    Singh, S.K., Wayal, G., Sharma, N.: A review: data mining with fuzzy association rule mining. Int. J. Eng. Res. Technol. (IJERT) 1(5) (2012)Google Scholar
  12. 12.
    Prabha, K.S., Lawrance, R.: Mining fuzzy frequent itemset using compact frequent pattern (CFP) tree algorithm. In: International Conference on Computing and Control Engineering (ICCCE) (2012)Google Scholar
  13. 13.
    Lin, C.-M., Chen, Y.-M., Hsueh, C.-S.: A self-organizing interval type-2 fuzzy neural network for radar emitter identification. Int. J. Fuzzy Syst. 16(1), 20–30 (2014)Google Scholar
  14. 14.
    Hahsler, M., Buchta, C., Gruen, B.: arules: Mining Association Rules and Frequent Itemsets. R package version 1.0-10 (2011).

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Science and Technology on Information System Engineering LaboratoryNRIEENanjingChina
  2. 2.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland
  3. 3.State Key Laboratory for Novel Software and TechnologyNanjing UniversityNanjingChina

Personalised recommendations