20+ Years of Analytics on Complex Data: Impact, Issues, Challenges and Contributions

  • Stefano Basta
  • Giuseppe MancoEmail author
  • Elio Masciari
  • Luigi Pontieri
Part of the Studies in Big Data book series (SBD, volume 31)


Computer Science is a relatively young discipline, but in the last two decades the advances in hardware technology and software engineering has induced notable changes in the way users interact with computers. In particular, several processes involving data have changed in a radical manner. As a matter of fact, the amount of data stored in repositories has grown at impressive rates due to the rise of data sources, such as sensor networks, social networks or operational processes. Moreover, the heterogeneity of data has dramatically increased. In a word, data and their management have became more and more complex.


Graphic Processing Unit Recommender System Outlier Detection Concept Drift Process Instance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    C.C. Aggarwal, C.K. Reddy (eds.), Data Clustering: Algorithms and Applications (CRC Press, Boca Raton, 2014)Google Scholar
  2. 2.
    C.C. Aggarwal, M.A. Bhuiyan, M. Al Hasan, Frequent Pattern Mining Algorithms: A Survey (Springer International Publishing, Cham, 2014), pp. 19–64Google Scholar
  3. 3.
    F. Angiulli, S. Basta, C. Pizzuti, Distance-based detection and prediction of outliers. TKDE 18(2), 145–160 (2006)zbMATHGoogle Scholar
  4. 4.
    F. Angiulli, S. Basta, S. Lodi, C. Sartori, A distributed approach to detect outliers in very large data sets. Euro-Par 1, 329–340 (2010)Google Scholar
  5. 5.
    F. Angiulli, S. Basta, S. Lodi, C. Sartori, Distributed strategies for mining outliers in large data sets. TKDE 25(7), 1520–1532 (2013)Google Scholar
  6. 6.
    F. Angiulli, S. Basta, S. Lodi, C. Sartori, Fast outlier detection using a gpu, in HPCS (2013), pp. 143–150Google Scholar
  7. 7.
    F. Angiulli, S. Basta, S. Lodi, C. Sartori, Accelerating outlier detection with intra- and inter-node parallelism, in HPCS (IEEE, 2014), pp. 476–483Google Scholar
  8. 8.
    F. Angiulli, S. Basta, S. Lodi, C. Sartori, GPU strategies for distance-based outlier detection. IEEE TPDS 27(11), 3256–3268 (2016)Google Scholar
  9. 9.
    T. Baier, J. Mendling, M. Weske, Bridging abstraction layers in process mining. Inf. Syst. 46, 123–139 (2014)CrossRefGoogle Scholar
  10. 10.
    N. Barbieri, G. Manco, An analysis of probabilistic methods for top-n recommendation in collaborative filtering, in ECML PKDD (2011), pp. 172–187Google Scholar
  11. 11.
    N. Barbieri, M. Guarascio, G. Manco, A block mixture model for pattern discovery in preference data, in ICDMW (2010), pp. 1100–1107Google Scholar
  12. 12.
    N. Barbieri, F. Bonchi, G. Manco, Cascade-based community detection, in WSDM (2013), pp. 33–42Google Scholar
  13. 13.
    N. Barbieri, F. Bonchi, G. Manco, Influence-based network-oblivious community detection, in ICDM (2013), pp. 955–960Google Scholar
  14. 14.
    N. Barbieri, F. Bonchi, G. Manco, Topic-aware social influence propagation models. Knowl. Inf. Syst. 37(3), 555–584 (2013)CrossRefGoogle Scholar
  15. 15.
    N. Barbieri, F. Bonchi, G. Manco, Who to follow and why: link prediction with explanations, in KDD (2014), pp. 1266–1275Google Scholar
  16. 16.
    S. Basta et al., High quality true-positive prediction for fiscal fraud detection, in ICDM Workshops (2009), pp. 7–12Google Scholar
  17. 17.
    J.C.A.M. Buijs, B.F. van Dongen, W.M.P. van der Aalst, On the role of fitness, precision, generalization and simplicity in process discovery, in On the Move to Meaningful Internet Systems: OTM 2012, vol. 7565 (2012), pp. 305–322Google Scholar
  18. 18.
    M. Ceci, R. Corizzo, F. Fumarola, M. Ianni, D. Malerba, G. Maria, E. Masciari, M. Oliverio, A. Rashkovska, Big data techniques for supporting accurate predictions of energy production from renewable sources, in IDEAS (2015), pp. 62–71Google Scholar
  19. 19.
    E. Cesario, G. Manco, R. Ortale, Top-down parameter-free clustering of high-dimensional categorical data. IEEE Trans. Knowl. Data Eng. 19(12), 1607–1624 (2007)CrossRefGoogle Scholar
  20. 20.
    E. Cesario, F. Folino, A. Locane, G. Manco, R. Ortale, Boosting text segmentation via progressive classification. Knowl. Inf. Syst. 15(3), 285–320 (2008)CrossRefGoogle Scholar
  21. 21.
    V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)CrossRefGoogle Scholar
  22. 22.
    N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)zbMATHGoogle Scholar
  23. 23.
    D. Cook, L. Holder, Mining Graph Data (Wiley, Hoboken, 2007)zbMATHGoogle Scholar
  24. 24.
    G. Costa, F. Folino, A. Locane, G. Manco, R. Ortale, Data mining for effective risk analysis in a bank intelligence scenario, in ICDE Workshops (2007), pp. 904–911Google Scholar
  25. 25.
    G. Costa, G. Manco, R. Ortale, An incremental clustering scheme for data de-duplication. Data Min. Knowl. Discov. 20(1), 152–187 (2010)MathSciNetCrossRefGoogle Scholar
  26. 26.
    G. Costa, G. Manco, R. Ortale, E. Ritacco, From global to local and viceversa: uses of associative rule learning for classification in imprecise environments. Knowl. Inf. Syst. 33(1), 137–169 (2011)CrossRefGoogle Scholar
  27. 27.
    G. Costa, G. Manco, R. Ortale, A generative bayesian model for item and user recommendation in social rating networks with trust relationships, in ECML PKDD (2014), pp. 258–273Google Scholar
  28. 28.
    A. Cuzzocrea, F. Folino, M. Guarascio, L. Pontieri, A robust and versatile multi-view learning framework for the detection of deviant business process instances. Int. J. Cooperative Inf. Syst. 25(4), 1–56 (2016)CrossRefGoogle Scholar
  29. 29.
    G. Dong, J. Pei, Sequence Data Mining, vol. 33 (Springer Science & Business Media, Boston, 2007)zbMATHGoogle Scholar
  30. 30.
    R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification (Wiley, New York, 2001)zbMATHGoogle Scholar
  31. 31.
    K. Ezawa, M. Singh, S.W. Norton, Learning goal oriented bayesian networks for telecommunications risk management, in ICML (1996), pp. 139–147Google Scholar
  32. 32.
    R.E. Fawcett, F. Provost, Adaptive fraud detection. Data Min. Knowl. Disc. 3(1), 291–316 (1997)CrossRefGoogle Scholar
  33. 33.
    T. Fawcett, F. Provost, Adaptive fraud detection. Data Min. Knowl. Disc. 1, 291–316 (1997)CrossRefGoogle Scholar
  34. 34.
    S. Flesca, S. Garruzzo, E. Masciari, A. Tagarelli, Wrapping PDF documents exploiting uncertain knowledge, in CAiSE (2006), pp. 175–189Google Scholar
  35. 35.
    F. Folino, M. Guarascio, L. Pontieri, Mining predictive process models out of low-level multidimensional logs, in CAISE (2014), pp. 533–547Google Scholar
  36. 36.
    I. Gat-Viks, R. Sharan, R. Shamir, Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381 (2003)CrossRefGoogle Scholar
  37. 37.
    G. Greco, A. Guzzo, G. Manco, D. Saccà, Mining and reasoning on workflows. IEEE Trans. Knowl. Data Eng. 17(4), 519–534 (2005)CrossRefzbMATHGoogle Scholar
  38. 38.
    G. Greco, A. Guzzo, L. Pontieri, D. Saccà, Discovering expressive process models by clustering log traces. IEEE Trans. Knowl. Data Eng. 18(8), 1010–1027 (2006)CrossRefGoogle Scholar
  39. 39.
    G. Greco, A. Guzzo, L. Pontieri, Mining taxonomies of process models. Data Knowl. Eng. 67(1), 74–102 (2008)CrossRefGoogle Scholar
  40. 40.
    J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques (Morgan Kaufmann, Amsterdam, 2011)zbMATHGoogle Scholar
  41. 41.
    R. Jin, L. Si, C. Zhai, A study of mixture models for collaborative filtering. Inf. Retr. 9(3), 357–382 (2006)CrossRefGoogle Scholar
  42. 42.
    M. Kubat, R.C. Holte, S. Matwin, R. Kohavi, F. Provost, Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2), 192–215 (1998)Google Scholar
  43. 43.
    E. Masciari, Trajectory clustering via effective partitioning, in FQAS (2009), pp. 358–370Google Scholar
  44. 44.
    E. Masciari, SMART: stream monitoring enterprise activities by RFID tags. Inf. Sci. 195, 25–44 (2012)CrossRefGoogle Scholar
  45. 45.
    E. Masciari, G.M. Mazzeo, C. Zaniolo, Analysing microarray expression data through effective clustering. Inf. Sci. 262, 32–45 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  46. 46.
    T.M. Mitchell, Machine Learning (McGraw-Hill, New York, 1997)zbMATHGoogle Scholar
  47. 47.
    C. Phua, D. Alahakoon, V. Lee, Minority report in fraud detection: classification of skewed data, in ACM SIGKDD Explorations Newsletter, Special issue on learning from imbalanced datasets (2004), pp. 50–59Google Scholar
  48. 48.
    S. Ramaswamy, R. Rastogi, K. Shim, Efficient algorithms for mining outliers from large data sets, in SIGMOD (2000), pp. 427–438Google Scholar
  49. 49.
    F. Ricci, L. Rokach, B. Shapira, P.B. Kantor (eds.), Recommender Systems Handbook (Springer, New York, 2011)zbMATHGoogle Scholar
  50. 50.
    P. Riddle, R. Segal, O. Etzioni, Representation design and brute-force induction in a boeing manufacturing domain. Appl. Artif. Intell. 8(1), 125–147 (1994)CrossRefGoogle Scholar
  51. 51.
    H. Shan, A. Banerjee, Bayesian co-clustering, in ICDM (2008), pp. 530–539Google Scholar
  52. 52.
    J. Tang, Z. Chen, A. Fu, D. Cheung, Capabilities of outlier detection schemes in large datasets, framework and methodologies. Knowl. Inf. Syst. 11, 45–84 (2007). doi: 10.1007/s10115-005-0233-6
  53. 53.
    W. van der Aalst et al., Process mining manifesto, in Proceedings of BPI (2012), pp. 169–194Google Scholar
  54. 54.
    W.M.P. van der Aalst, Process Mining: Discovery, Conformance and Enhancement of Business Processes (Springer, Berlin, 2011)CrossRefzbMATHGoogle Scholar
  55. 55.
    J. Wang, G. Karypis, HARMONY: efficiently mining the best rules for classification, in Proceedings of SIAM International Conference on Data Mining (2005), pp. 205–216Google Scholar
  56. 56.
    G.M. Weiss, Mining with rarity: a unifying framework. SIGKDD Explor. 6(1), 7–19 (2004)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Stefano Basta
    • 1
  • Giuseppe Manco
    • 1
    Email author
  • Elio Masciari
    • 1
  • Luigi Pontieri
    • 1
  1. 1.ICAR-CNRRendeItaly

Personalised recommendations