Abstract
The aim of this article is to synthetically describe a sample of distinct approaches and applications of Relational Data Mining, which address the issue of managing complex, and possibly big, amounts of data. Specifically, we report a brief review of the literature on Relational Data Mining in the fields of Spatial Data Mining, Process Mining, Network Data Analysis and Stream Data Mining, with an emphasis on the Italian research. For each field, we describe the milestones that have been reached, as well as the future research trends that are fuelled by the emergent ubiquity of Big Data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Funded by the Ministry of Land, Infrastructure and Transport, South Korea.
References
P. Angin, J. Neville, A shrinkage approach for modeling non-stationary relational autocorrelation, in Proceedings of 8th IEEE International Conference on Data Mining (IEEE Computer Society, 2008), pp. 707–712
D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, A.Y. Ng, Discriminative learning of markov random fields for segmentation of 3d scan data, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20–26 June 2005, San Diego, CA, USA (IEEE Computer Society, 2005), pp. 169–176
L. Anselin, Spatial Econometrics: Methods and Models (Kluwer, Dordrecht, 1988)
A. Appice, Towards mining the organizational structure of a dynamic event scenario. J. Intell. Inf. Syst. 1–29 (2017)
A. Appice, D. Malerba, Leveraging the power of local spatial autocorrelation in geophysical interpolative clustering. Data Min. Knowl. Discov. 28(5–6), 1266–1313 (2014)
A. Appice, D. Malerba, A co-training strategy for multiple view clustering in process mining. IEEE Trans. Serv. Comput. 9(6), 832–845 (2016)
A. Appice, M. Ceci, C. Loglisci, C. Caruso, F. Fumarola, M. Todaro, D. Malerba, A relational approach to novelty detection in data streams, in Proceedings of the Seventeenth Italian Symposium on Advanced Database Systems, SEBD 2009, Camogli, Italy, June 21–24, 2009 (Edizioni Seneca, 2009), pp. 89–100
A. Appice, M. Ceci, A. Turi, D. Malerba, A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets. Intell. Data Anal. 15(1), 69–88 (2011)
A. Appice, A. Ciampi, D. Malerba, P. Guccione, Using trend clusters for spatiotemporal interpolation of missing data in a sensor network. J. Spat. Inf. Sci. 6(1), 119–153 (2013)
A. Appice, P. Guccione, D. Malerba, A. Ciampi, Dealing with temporal and spatial correlations to classify outliers in geophysical data streams. Inf. Sci. 285, 162–180 (2014)
A. Appice, A. Ciampi, D. Malerba, Summarizing numeric spatial data streams by trend cluster discovery. Data Min. Knowl. Discov. 29(1), 84–136 (2015)
A. Azzini, E. Damiani, Process mining in big data scenario, in Proceedings of the 5th International Symposium on Data-driven Process Discovery and Analysis (SIMPDA 2015), Vienna, Austria, December 9-11, 2015, vol. 1527 of CEUR Workshop Proceedings (CEUR-WS.org, 2015), pp. 149–153
S. Bergamaschi, E. Carlini, M. Ceci, B. Furletti, F. Giannotti, D. Malerba, M. Mezzanzanica, A. Monreale, G. Pasi, D. Pedreschi, R. Perego, S. Ruggieri, Big data research in italy: a perspective. Engineering 2(2), 163–170 (2016)
M. Berlingerio, M. Coscia, F. Giannotti, A. Monreale, D. Pedreschi, Evolving networks: eras and turning points. Intell. Data Anal. 17(1), 27–48 (2013)
H. Blockeel, M. Sebag, Scalability and efficiency in multi-relational data mining. SIGKDD Explor. 5(1), 17–30 (2003)
M. Ceci, A. Appice, D. Malerba, Spatial associative classification at different levels of granularity: a probabilistic approach, in Knowledge Discovery in Databases: PKDD 2004, 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, September 20–24, 2004, Proceedings, ed. By J. Boulicaut, F. Esposito, F. Giannotti, D. Pedreschi. Vol. 3202 of Lecture Notes in Computer Science (Springer, 2004), pp. 99–111
M. Ceci, A. Appice, D. Malerba, Discovering emerging patterns in spatial databases: a multi-relational approach, in Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007, Proceedings, ed. By J.N. Kok, J. Koronacki, R.L. de Mántaras, S. Matwin, D. Mladenic, A. Skowron. Vol. 4702 of Lecture Notes in Computer Science (Springer, 2007), pp. 390–397
M. Ceci, M. Berardi, D. Malerba, Relational data mining and ILP for document image understanding. Appl. Artif. Intell. 21(4&5), 317–342 (2007)
M. Ceci, P.F. Lanotte, F. Fumarola, D.P. Cavallo, D. Malerba, Completion time and next activity prediction of processes using sequential pattern mining, in Discovery Science - 17th International Conference, DS 2014, Bled, Slovenia, October 8-10, 2014. Proceedings, vol. 8777 of Lecture Notes in Computer Science (Springer, 2014), pp. 49–61
M. Ceci, R. Corizzo, F. Fumarola, M. Ianni, D. Malerba, G. Maria, E. Masciari, M. Oliverio, A. Rashkovska, Big data techniques for supporting accurate predictions of energy production from renewable sources, in Proceedings of the 19th International Database Engineering & Applications Symposium, Yokohama, Japan, July 13-15, 2015, ed. By B.C. Desai, M. Toyama (ACM, 2015), pp. 62–71
M. Ceci, R. Corizzo, F. Fumarola, M. Ianni, D. Malerba, G. Maria, E. Masciari, M. Oliverio, A. Rashkovska. VIPOC project research summary (discussion paper), in 23rd Italian Symposium on Advanced Database Systems, SEBD 2015, Gaeta, Italy, June 14-17, 2015, ed. By D. Lembo, R. Torlone, A. Marrella (Curran Associates, Inc., 2015), pp. 208–215
M. Ceci, G. Pio, V. Kuzmanovski, S. Dzeroski, Semi-supervised multi-view learning for gene network reconstruction. Plos One 10(5), e0144031, 2015-12-07 00:00:00.0
M. Ceci, R. Corizzo, F. Fumarola, D. Malerba, A. Rashkovska, Predictive modeling of pv energy production: how to set up the learning task for a better prediction? IEEE Transactions on Industrial Informatics PP(99), 1–1 (2016)
M. Celik, B. Kazar, S. Shekhar, D. Boley, D.L. Northstar, A parameter estimation method for the spatial autoregression model. Technical Report Report No: 2005-00, AHPCRC, 2007
A. Ciampi, A. Appice, D. Malerba, G. Saponaro, D. Triglione, Clustering spatio-temporal data streams, in Proceedings of the Eighteenth Italian Symposium on Advanced Database Systems, SEBD 2010, Rimini, Italy, June 20-23, 2010 (Esculapio Editore, 2010), pp. 230–241
N. Cressie, Statistics for Spatial Data, 1st edn. (Wiley, Chichester, 1993)
T.L.C.da Silva, K. Zeitouni, J.A.F.d. Macdo, M.A. Casanova, A framework for online mobility pattern discovery from trajectory data streams, in 2016 17th IEEE International Conference on Mobile Data Management (MDM), vol. 1 (2016), pp. 365–368
G. De Francisci Morales, A. Bifet, L. Khan, J. Gama, W. Fan, Iot big data stream mining, in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16 (ACM, 2016), pp. 2119–2120
H. Deng, Y.L. Wang, J. Yang, L.Q. Feng, Framework of service-oriented manufacturing based on multi-relational data stream mining, in 2012 International Conference on Computer Science and Service System (2012), pp. 1427–1430
C. Diamantini, D. Potena, E. Storti, Clustering of process schemas by graph mining techniques (extended abstract), in Sistemi Evoluti per Basi di Dati - SEBD 2011, Proceedings of the Nineteenth Italian Symposium on Advanced Database Systems, Maratea, Italy, June 26-29, 2011 (2011), p. 49
C. Diamantini, L. Genga, D. Potena, W.M.P. van der Aalst, Building instance graphs for highly variable processes. Expert Syst. Appl. 59, 101–118 (2016)
S. Džeroski, N. Lavrač, Relational Data Mining (Springer, Berlin, 2001)
S. Ferilli, Woman: logic-based workflow learning and management. IEEE Trans. Syst. Man Cybern. Syst. 44(6), 744–756 (2014)
S. Ferilli, The woman formalism for expressing process models, in Advances in Data Mining. Applications and Theoretical Aspects - 16th Industrial Conference, ICDM 2016, New York, NY, USA, July 13-17, 2016. Proceedings, vol. 9728 of Lecture Notes in Computer Science (Springer, 2016), pp. 363–378
S. Ferilli, F. Esposito, A logic framework for incremental learning of process models. Fundam. Inform. 128(4), 413–443 (2013)
S. Ferilli, B.D. Carolis, D. Redavid, Logic-based incremental process mining in smart environments, in Recent Trends in Applied Artificial Intelligence, 26th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2013, Amsterdam, The Netherlands, June 17-21, 2013. Proceedings, vol. 7906 of Lecture Notes in Computer Science (Springer, 2013), pp. 392–401
S. Ferilli, B.D. Carolis, F. Esposito, Learning complex activity preconditions in process mining, in New Frontiers in Mining Complex Patterns - Third International Workshop, NFMCP 2014, Held in Conjunction with ECML-PKDD 2014, Nancy, France, September 19, 2014, Revised Selected Papers, vol. 8983 of Lecture Notes in Computer Science (Springer, 2014), pp. 164–178
S. Ferilli, F. Esposito, D. Redavid, S. Angelastro, Predicting process behavior in woman, in AI*IA 2016: Advances in Artificial Intelligence - XVth International Conference of the Italian Association for Artificial Intelligence, Genova, Italy, November 29 - December 1, 2016, Proceedings, vol. 10037 of Lecture Notes in Computer Science (Springer, 2016), pp. 308–320
F. Folino, G. Greco, A. Guzzo, L. Pontieri, Mining usage scenarios in business processes: outlier-aware discovery and run-time prediction. Data Knowl. Eng. 70(12), 1005–1029 (2011)
F. Fumarola, A. Ciampi, A. Appice, D. Malerba, A sliding window algorithm for relational frequent patterns mining from data streams, in Discovery Science, 12th International Conference, DS 2009, Porto, Portugal, October 3–5, 2009, vol. 5808 of Lecture Notes in Computer Science (Springer, 2009), pp. 385–392
M.M. Gaber, A. Zaslavsky, S. Krishnaswamy, Mining data streams: a review. SIGMOD Rec. 34(2), 18–26 (2005)
J. Gama, A.R. Ganguly, O.A. Omitaomu, R.R. Vatsavai, M.M. Gaber, Knowledge discovery from data streams. Intell. Data Anal. 13(3), 403–404 (2009)
L. Ghionna, G. Greco, A. Guzzo, L. Pontieri, Outlier detection techniques for process mining applications, in Proceedings of the Sixteenth Italian Symposium on Advanced Database Systems, SEBD 2008, 22–25 June 2008, Mondello, PA, Italy (2008), pp. 263–270
G. Greco, A. Guzzo, F. Lupia, L. Pontieri, Process discovery under precedence constraints. ACM Trans. Knowl. Discov. Data 9(4), 32:1–32:39
G. Greco, A. Guzzo, D. Saccà, A logic programming approach for planning workflows evolutions, in 2003 Joint Conference on Declarative Programming, AGP-2003, Reggio Calabria, Italy, September 3–5, 2003 (2003), pp. 75–85
G. Greco, A. Guzzo, G. Manco, D. Saccà, Mining correlations in workflows executions, in Proceedings of the Thirteenth Italian Symposium on Advanced Database Systems, SEBD 2005, Brixen-Bressanone (near Bozen-Bolzano), Italy, June 19–22, 2005 (2005), pp. 137–148
G. Greco, A. Guzzo, G. Manco, L. Pontieri, D. Saccà, Mining Constrained Graphs: The Case of Workflow Systems (Springer, Berlin, 2006), pp. 155–171
G. Greco, A. Guzzo, L. Pontieri, D. Saccà, Discovering expressive process models by clustering log traces. IEEE Trans. Knowl. Data Eng. 18(8), 1010–1027 (2006)
G. Greco, A. Guzzo, L. Pontieri, An information-theoretic framework for process structure and data mining. IJDWM 3(4), 99–119 (2007)
S. Hernández, J. Ezpeleta, S.J. van Zelst, W.M.P. van der Aalst, Assessing process discovery scalability in data intensive environments, in 2nd IEEE/ACM International Symposium on Big Data Computing, BDC 2015, Limassol, Cyprus, December 7–10, 2015 (IEEE Computer Society, 2015), pp. 99–104
X. Jiang, N. Nariai, M. Steffen, S. Kasif, E. Kolaczyk, Integration of relational and hierarchical network information for protein function prediction. BMC Bioinform. 9(1), 1–15 (2008)
B.M. Kazar, S. Shekhar, D.J. Lilja, R.R. Vatsavai, R.K. Pace, Comparing Exact and Approximate Spatial Auto-regression Model Solutions for Spatial Data Analysis (Springer, Berlin, 2004), pp. 140–161
L.J. Klein, F.J. Marianno, C.M. Albrecht, M. Freitag, S. Lu, N. Hinds, X. Shao, S. Bermudez Rodriguez, H.F. Hamann, Pairs: a scalable geo-spatial data analytics platform, in Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), BIG DATA’15 (IEEE Computer Society, Washington, DC, USA, 2015), pp. 1290–1298
G. Krempl, I. Zliobaite, D. Brzezinski, E. Hüllermeier, M. Last, V. Lemaire, T. Noack, A. Shaker, S. Sievi, M. Spiliopoulou, J. Stefanowski, Open challenges for data stream mining research. SIGKDD Explor. 16(1), 1–10 (2014)
P. Legendre, Spatial autocorrelation: trouble or new paradigm? Ecology 74(6), 1659–1673 (1993)
F. Lettich, L.O. Alvares, V. Bogorny, S. Orlando, A. Raffaetà, C. Silvestri, Detecting avoidance behaviors between moving object trajectories. Data Knowl. Eng. 102, 22–41 (2016)
C. Loglisci, D. Malerba, Mining Dense Regions from Vehicular Mobility in Streaming Setting (Springer International Publishing, Cham, 2014), pp. 40–49
C. Loglisci, M. Ceci, A. Appice, D. Malerba, Relational disjunctive patterns mining for discovering frequent variants in process models, in Sistemi Evoluti per Basi di Dati - SEBD 2011, Proceedings of the Nineteenth Italian Symposium on Advanced Database Systems, Maratea, Italy, June 26-29, 2011 (2011), pp. 227–238
C. Loglisci, M. Ceci, D. Malerba, Relational mining for discovering changes in evolving networks. Neurocomputing 150, 265–288 (2015)
D. Malerba, A relational perspective on spatial data mining. IJDMMM 1(1), 103–118 (2008)
E. Masciari, S. Gao, C. Zaniolo, Sequential pattern mining from trajectory data, in 17th International Database Engineering & Applications Symposium, IDEAS ’13, Barcelona, Spain - October 09 - 11, 2013, ed. By B.C. Desai, J. Larriba-Pey, J. Bernardinopages (ACM, 2013), pp. 162–167
M. McPherson, L. Smith-Lovin, J. Cook, Birds of a feather: homophily in social networks. Annu. Rev. Sociol. 27, 415–444 (2001)
A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti, Wherenext: a location predictor on trajectory pattern mining, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09 (ACM, New York, NY, USA, 2009), pp. 637–646
M. Nanni, R. Trasarti, Querying and mining trajectories with gaps: a multi-path reconstruction approach (extended abstract), in Proceedings of the Eighteenth Italian Symposium on Advanced Database Systems, SEBD 2010, Rimini, Italy, June 20–23, 2010, ed. By S. Bergamaschi, S. Lodi, R. Martoglia, C. Sartori (Esculapio Editore, 2010), pp. 126–133
M.E.J. Newman, D.J. Watts, The Structure and Dynamics of Networks (Princeton University Press, Princeton, 2006)
L. Pappalardo, D. Pedreschi, Z. Smoreda, F. Giannotti, Using big data to study the link between human mobility and socio-economic development, in 2015 IEEE International Conference on Big Data (Big Data) (2015), pp. 871–878
G. Pio, M. Ceci, D. D’Elia, C. Loglisci, D. Malerba, A novel biclustering algorithm for the discovery of meaningful biological correlations between micrornas and their target genes. BMC Bioinform. 14(S-7), S8 (2013)
G. Pio, F. Fumarola, A.E. Felle, D. Malerba, M. Ceci, Discovering novelty patterns from the ancient christian inscriptions of rome. JOCCH 7(4), 22:1–22:21 (2014)
G. Pio, M. Ceci, D. Malerba, D. D’Elia, Comirnet: a web-based system for the analysis of mirna-gene regulatory networks. BMC Bioinform. 16(S-9), S7 (2015)
N. Pržulj, N. Malod-Dognin, Network analytics in the age of big data. Science 353(6295), 123–124 (2016)
S. Rinzivillo, L. Gabrielli, M. Nanni, L. Pappalardo, D. Pedreschi, F. Giannotti, The purpose of motion: learning activities from individual mobility networks, in International Conference on Data Science and Advanced Analytics, DSAA 2014, Shanghai, China, October 30 - November 1, 2014 (IEEE, 2014), pp. 312–318
G. Rossetti, R. Guidotti, I. Miliou, D. Pedreschi, F. Giannotti, A supervised approach for intra-/inter-community interaction prediction in dynamic social networks. Soc. Netw. Analys. Min. 6(1), 86:1–86:20 (2016)
G. Rossetti, L. Pappalardo, R. Kikas, D. Pedreschi, F. Giannotti, M. Dumas, Homophilic network decomposition: a community-centric analysis of online social services. Soc. Netw. Analys. Min. 6(1), 103:1–103:18 (2016)
A. Silva, C. Antunes, Multi-relational pattern mining over data streams. Data Min. Knowl. Disc. 29(6), 1783–1814 (2015)
A. Srinivasan, T.A. Faruquie, S. Joshi, Data and task parallelism in ilp using mapreduce. Mach. Learn. 86(1), 141–168 (2012)
D. Stojanova, M. Ceci, A. Appice, S. Dzeroski, Network regression with predictive clustering trees. Data Min. Knowl. Discov. 25(2), 378–413 (2012)
D. Stojanova, M. Ceci, A. Appice, D. Malerba, S. Dzeroski, Dealing with spatial autocorrelation when learning predictive clustering trees. Ecol. Inf. 13, 22–39 (2013)
D. Stojanova, M. Ceci, D. Malerba, S. Deroski, Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction. BMC Bioinform. 14, 285 (2013)
A. Turi, A. Appice, M. Ceci, D. Malerba, Distributed discovery of multi-level approximate process patterns, in Proceedings of the Sixteenth Italian Symposium on Advanced Database Systems, SEBD 2008, 22–25 June 2008, Italy, Mondello, PA, ed. by S. Gaglio, I. Infantino, D. Saccà (2008), pp. 57–68
W.M.P. van der Aalst, Process Mining - Discovery, Conformance and Enhancement of Business Processes (Springer, Berlin, 2011)
W.M.P. van der Aalst, No knowledge without processes - process mining as a tool to find out what people and organizations really do, in KEOD 2014 - Proceedings of the International Conference on Knowledge Engineering and Ontology Development, Rome, Italy, 21-24 October, 2014, ed. By J. Filipe, J.L.G. Dietz, D. Aveiro (SciTePress, 2014), pp. IS–11
W.M.P. van der Aalst, Green data science - using big data in an “environmentally friendly” manner, in ICEIS 2016 - Proceedings of the 18th International Conference on Enterprise Information Systems, Volume 1, Rome, Italy, April 25-28, 2016, ed. By S. Hammoudi, L.A. Maciaszek, M. Missikoff, O. Camp, J. Cordeiro (SciTePress, 2016), pp. 9–21
W.M.P. van der Aalst, Process Mining - Data Science in Action, 2nd edn. (Springer, Berlin, 2016)
W.M.P. van der Aalst, E. Damiani, Processes meet big data: connecting data science with process science. IEEE Trans. Serv. Comput. 8(6), 810–819 (2015)
R.R. Vatsavai, A. Ganguly, V. Chandola, A. Stefanidis, S. Klasky, S. Shekhar, Spatiotemporal data mining in the era of big spatial data: algorithms and applications, in Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial ’12 (ACM, New York, NY, USA, 2012), pp. 1–10
M. Wang, J. Liu, W. Zhou, Design and implementation of a high-performance stream-oriented big data processing system, in 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), vol. 01 (2016), pp. 363–368
H. Watanabe and S. Muggleton. Can ilp be applied to large datasets? in Inductive Logic Programming: 19th International Conference, ILP 2009, Leuven, Belgium, July 02-04, 2009. Revised Papers, ed. By L. De Raedt (Springer, Berlin, Heidelberg, 2010), pp. 249–256
X. Wu, X. Zhu, G.Q. Wu, W. Ding, Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Acknowledgements
The research described in this paper has been funded by the European project MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data (Grant number ICT-2013-612944), the European project H2020 “TOREADOR - TrustwOrthy model-awaRE Analytics Data platform” (Grant number 988797).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Appice, A., Ceci, M., Malerba, D. (2018). Relational Data Mining in the Era of Big Data. In: Flesca, S., Greco, S., Masciari, E., Saccà, D. (eds) A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years. Studies in Big Data, vol 31. Springer, Cham. https://doi.org/10.1007/978-3-319-61893-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-61893-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61892-0
Online ISBN: 978-3-319-61893-7
eBook Packages: EngineeringEngineering (R0)