How Data Mining and Machine Learning Evolved from Relational Data Base to Data Science

  • G. Amato
  • L. Candela
  • D. Castelli
  • A. Esuli
  • F. Falchi
  • C. Gennaro
  • F. Giannotti
  • A. Monreale
  • M. Nanni
  • P. Pagano
  • L. Pappalardo
  • D. Pedreschi
  • F. Pratesi
  • F. Rabitti
  • S. Rinzivillo
  • G. Rossetti
  • S. Ruggieri
  • F. Sebastiani
  • M. Tesconi
Chapter

Abstract

During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led to profound pervasiveness of relational databases in any kind of organization. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today.

References

  1. 1.
    R. Agrawal, T. Imieliński, A. Swami, Mining association rules between sets of items in large databases, in Acm Sigmod Record, vol. 22 (ACM, 1993), pp. 207–216Google Scholar
  2. 2.
    R. Agrawal, R. Srikant, Algorithms for mining association rules in large databases, in Proceedings of the 20th VLDB Conference, vol. 2 (1994), pp. 141–182Google Scholar
  3. 3.
    C. Aliprandi, A.E. De Luca, G. Di Pietro, M. Raffaelli, D. Gazzè, M.N. La Polla, A. Marchetti, M. Tesconi, Caper: crawling and analysing facebook for intelligence purposes, in 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (IEEE, 2014), pp. 665–669Google Scholar
  4. 4.
    G. Amato, P. Bolettieri, F. Falchi, C. Gennaro, F. Rabitti, Combining local and global visual feature similarity using a text search engine, in International Workshop on Content-Based Multimedia Indexing (CBMI) (IEEE, 2011), pp. 49–54Google Scholar
  5. 5.
    G. Amato, C. Gennaro, P. Savino, Mi-file: using inverted files for scalable approximate similarity search. Multimed. Tools Appl. 71(3), 1333–1362 (2014)CrossRefGoogle Scholar
  6. 6.
    G. Amato, F. Debole, F. Falchi, C. Gennaro, F. Rabitti, Large scale indexing and searching deep convolutional neural network features, in International Conference on Big Data Analytics and Knowledge Discovery (Springer, Berlin, 2016), pp. 213–224Google Scholar
  7. 7.
    G. Amato, F. Falchi, C. Gennaro, F. Rabitti, YFCC100M-HNfc6: a large-scale deep features benchmark for similarity search, in International Conference on Similarity Search and Applications (Springer, Berlin, 2016), pp. 196–209Google Scholar
  8. 8.
    G. Amato, F. Carrara, F. Falchi, C. Gennaro, C. Meghini, C. Vairo, Deep learning for decentralized parking lot occupancy detection. Exp. Syst. Appl. 72, 327–334 (2017)CrossRefGoogle Scholar
  9. 9.
    G. Andrienko, N. Andrienko, S. Rinzivillo, M. Nanni, D. Pedreschi, F. Giannotti, Interactive Visual Clustering of Large Collections of Trajectories. VAST: Symposium on Visual Analytics Science and Technology (2009)Google Scholar
  10. 10.
    M. Assante, L. Candela, D. Castelli, G. Coro, L. Lelii, P. Pagano, Virtual research environments as-a-service by gCube. PeerJ Preprints (2016)Google Scholar
  11. 11.
    M. Avvenuti, S. Cresci, F. Del Vigna, M. Tesconi, Impromptu crisis mapping to prioritize emergency response. Computer 49(5), 28–37 (2016)CrossRefGoogle Scholar
  12. 12.
    S. Baccianella, A. Esuli, F. Sebastiani, Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining, in Proceedings of the 7th Conference on Language Resources and Evaluation (LREC 2010) (2010)Google Scholar
  13. 13.
    A.L. Barabási, R. Albert, Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    M. Berlingerio, M. Coscia, F. Giannotti, A. Monreale, D. Pedreschi, Multidimensional networks: foundations of structural analysis. World Wide Web 16(5–6), 567–593 (2013)CrossRefGoogle Scholar
  15. 15.
    P. Bolettieri, A. Esuli, F. Falchi, C. Lucchese, R. Perego, T. Piccioli, F. Rabitti, CoPhIR: a test collection for content-based image retrieval (2009), arXiv:0905.4627
  16. 16.
    L. Candela, D. Castelli, P. Pagano, Virtual research environments: an overview and a research agenda. Data Sci. J. 12, GRDI75–GRDI81 (2013)Google Scholar
  17. 17.
    L. Candela, D. Castelli, A. Manzi, P. Pagano, Realising virtual research environments by hybrid data infrastructures: the D4 science experience, in International Symposium on Grids and Clouds (ISGC) 2014 23–28 March 2014, Academia Sinica, Taipei, Taiwan, PoS(ISGC2014)022. Proceedings of Science (2014)Google Scholar
  18. 18.
    F. Carrara, A. Esuli, T. Fagni, F. Falchi, A.M. Fernández, Picture it in your mind: generating high level visual representations from textual descriptions (2016), arXiv:1606.07287
  19. 19.
    E. Fernández-del Castillo, D. Scardaci, Á.L. García, The EGI federated cloud e-infrastructure, in Procedia Computer Science - 1st International Conference on Cloud Forward: From Distributed to Complete Computing, vol. 68 (2015)Google Scholar
  20. 20.
    A. Cavoukian, Privacy design principles for an integrated justice system - working paper (2000), https://www.ipc.on.ca/index.asp?layid=86&fid1=318
  21. 21.
    G. Coro, L. Candela, P. Pagano, A. Italiano, L. Liccardo, Parallelizing the execution of native data mining algorithms for computational biology. Concurr. Comput.: Pract. Exp. 27(17), 4630–4644 (2015)Google Scholar
  22. 22.
    M. Coscia, F. Giannotti, D. Pedreschi, A classification for community discovery methods in complex networks. Stat. Anal. Data Min. 4(5), 512–546 (2011)MathSciNetCrossRefGoogle Scholar
  23. 23.
    M. Coscia, S. Rinzivillo, F. Giannotti, D. Pedreschi, Optimal spatial resolution for the analysis of human mobility, in Proceedings of the International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (IEEE, 2012), pp. 248–252Google Scholar
  24. 24.
    M. Coscia, G. Rossetti, F. Giannotti, D. Pedreschi, Demon: a local-first discovery method for overlapping communities, in Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2012), pp. 615–623Google Scholar
  25. 25.
    G. Da San Martino, W. Gao, F. Sebastiani, Ordinal text quantification, in Proceedings of the 39th ACM Conference on Research and Development in Information Retrieval (SIGIR 2016) (2016), pp. 937–940Google Scholar
  26. 26.
    F. Del Vigna, M. Petrocchi, A. Tommasi, C. Zavattari, M. Tesconi, Semi-supervised knowledge extraction for detection of drugs and their effects, in International Conference on Social Informatics (Springer, Berlin, 2016), pp. 494–509Google Scholar
  27. 27.
    C. Dwork, Differential privacy, in Automata, Languages and Programming, ed. by M. Bugliesi, B. Preneel, V. Sassone, I. Wegener. Lecture Notes in Computer Science, vol. 4052 (Springer, Berlin, 2006), pp. 1–12. doi:10.1007/11787006_1
  28. 28.
    P.N. Edwards, S.J. Jackson, G.C. Bowker, C.P. Knobel, Understanding infrastructure: dynamics, tensions, and design. Working paper, National Science Foundation (2007), http://hdl.handle.net/2027.42/49353
  29. 29.
    A. Esuli, F. Sebastiani, Determining term subjectivity and term orientation for opinion mining, in Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 193–200Google Scholar
  30. 30.
    A. Esuli, F. Sebastiani, Determining the semantic orientation of terms through gloss analysis, in Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM 2005) (2005), pp. 617–624Google Scholar
  31. 31.
    A. Esuli, F. Sebastiani, Sentiwordnet: a publicly available lexical resource for opinion mining, in Proceedings of the Conference on Language Resources and Evaluation (LREC) (2006), pp. 417–422Google Scholar
  32. 32.
    A. Esuli, F. Sebastiani, Sentiment quantification. IEEE Intell. Syst. 25(4), 72–75 (2010)CrossRefGoogle Scholar
  33. 33.
    U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Advances in Knowledge Discovery and Data Mining, vol. 21 (AAAI Press, Menlo Park, 1996)Google Scholar
  34. 34.
    B. Fecher, S. Friesike, Open science: one term, five schools of thought, in Opening Science, ed. by S. Bartling, S. Friesike (Springer, Berlin, 2014), pp. 17–47Google Scholar
  35. 35.
    B. Furletti, L. Gabrielli, C. Renso, S. Rinzivillo, Analysis of GSM calls data for understanding user mobility behavior (2013)Google Scholar
  36. 36.
    L. Gabrielli, B. Furletti, R. Trasarti, F. Giannotti, D. Pedreschi, City users’ classification with mobile phone data, in IEEE Big Data (2015)Google Scholar
  37. 37.
    W. Gao, F. Sebastiani, Tweet sentiment: from classification to quantification, in Proceedings of the 7th International Conference on Advances in Social Network Analysis and Mining (ASONAM 2015) (Paris, FR, 2015), pp. 97–104Google Scholar
  38. 38.
    W. Gao, F. Sebastiani, From classification to quantification in tweet sentiment analysis. Soc. Netw. Anal. Min. 6(19), 1–22 (2016)Google Scholar
  39. 39.
    F. Giannotti, M. Nanni, F. Pinelli, D. Pedreschi, Trajectory pattern mining, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD, ACM, 2007), pp. 330–339Google Scholar
  40. 40.
    F. Giannotti, M. Nanni, D. Pedreschi, F. Pinelli, C. Renso, S. Rinzivillo, R. Trasarti, Unveiling the complexity of human mobility by querying and mining massive trajectory data. VLDB J. 20(5), 695–719 (2011)CrossRefGoogle Scholar
  41. 41.
    F. Giannotti, L.V.S. Lakshmanan, A. Monreale, D. Pedreschi, W.H. Wang, Privacy-preserving mining of association rules from outsourced transaction databases. IEEE Syst. J. 7(3), 385–395 (2013)CrossRefGoogle Scholar
  42. 42.
    R. Guidotti, M. Nanni, S. Rinzivillo, D. Pedreschi, F. Giannotti, Never drive alone: boosting carpooling with network analysis. Inf. Syst. 64, 237–257 (2016)Google Scholar
  43. 43.
    S. Hajian, J. Domingo-Ferrer, A. Monreale, D. Pedreschi, F. Giannotti, Discrimination- and privacy-aware patterns. Data Min. Knowl. Discov. 29(6), 1733–1782 (2015)MathSciNetCrossRefGoogle Scholar
  44. 44.
    S. Khalifa, Y. Elshater, K. Sundaravarathan, A. Bhat, P. Martin, F. Imam, D. Rope, M. Mcroberts, C. Statchuk, The six pillars for building big data analytics ecosystems. ACM Comput. Surv. 49(2), 33 (2016)Google Scholar
  45. 45.
    J.G. Lee, J. Han, Trajectory clustering: a partition-and-group framework, in In SIGMOD (2007), pp. 593–604Google Scholar
  46. 46.
    C.S. Liew, M.P. Atkinson, M. Galea, T.F. Ang, P. Martin, J.I.V. Hemert, Scientific workflows: moving across paradigms. ACM Comput. Surv. 49(4) 66 (2016)Google Scholar
  47. 47.
    L. Milli, A. Monreale, G. Rossetti, D. Pedreschi, F. Giannotti, F. Sebastiani, Quantification in social networks, in 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), vol. 36678 (IEEE, 2015), pp. 1–10Google Scholar
  48. 48.
    A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti, Wherenext: a location predictor on trajectory pattern mining, in ACM SIGKDD Conference on Knoledge Discovery and Data Mining (KDD) (2009)Google Scholar
  49. 49.
    A. Monreale, G.L. Andrienko, N.V. Andrienko, F. Giannotti, D. Pedreschi, S. Rinzivillo, S. Wrobel, Movement data anonymity through generalization. TDP 3(2), 91–121 (2010)MathSciNetGoogle Scholar
  50. 50.
    A. Monreale, W.H. Wang, F. Pratesi, S. Rinzivillo, D. Pedreschi, G. Andrienko, N. Andrienko, Privacy-preserving distributed movement data aggregation, in AGILE (Springer, Berlin, 2013)Google Scholar
  51. 51.
    A. Monreale, S. Rinzivillo, F. Pratesi, F. Giannotti, D. Pedreschi, Privacy-by-design in big data analytics and social mining. EPJ Data Sci. 3(1), 10 (2014). doi:10.1140/epjds/s13688-014-0010-4
  52. 52.
    A. Moreo Fernández, A. Esuli, F. Sebastiani, Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification. J. Artif. Intell. Res. 55, 131–163 (2016)MathSciNetMATHGoogle Scholar
  53. 53.
    L. Pappalardo, G. Rossetti, D. Pedreschi, “How well do we know each other?” detecting tie strength in multidimensional social networks, in 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (IEEE, 2012), pp. 1040–1045Google Scholar
  54. 54.
    L. Pappalardo, F. Simini, S. Rinzivillo, D. Pedreschi, F. Giannotti, A.L. Barabasi, Returners and explorers dichotomy in human mobility. Nat. Commun. 6, 8166 (2015). doi:10.1038/ncomms9166
  55. 55.
    D. Pedreschi, S. Ruggieri, F. Turini, Measuring discrimination in socially-sensitive decision records, in Proceedings of the SIAM International Conference on Data Mining (SDM 2009) (SIAM, 2009), pp. 581–592Google Scholar
  56. 56.
    J.R. Quinlan, C4. 5: Programs for Machine Learning (Elsevier, San Francisco, 2014)Google Scholar
  57. 57.
    S. Rinzivillo, S. Mainardi, F. Pezzoni, M. Coscia, D. Pedreschi, F. Giannotti, Discovering the geographical borders of human mobility. KI-Künstl. Intell. 26(3), 253–260 (2012)CrossRefGoogle Scholar
  58. 58.
    S. Rinzivillo, L. Gabrielli, M. Nanni, L. Pappalardo, D. Pedreschi, F. Giannotti, The purpose of motion: learning activities from individual mobility networks, in International Conference on Data Science and Advanced Analytics, DSAA (2014). doi:10.1109/DSAA.2014.7058090
  59. 59.
    A. Romei, S. Ruggieri, A multidisciplinary survey on discrimination analysis. Knowl. Eng. Rev. 29(5), 582–638 (2014)CrossRefGoogle Scholar
  60. 60.
    G. Rossetti, M. Berlingerio, F. Giannotti, Scalable link prediction on multidimensional networks, in International Conference on Data Mining Workshops (ICDMW) (IEEE, 2011), pp. 979–986Google Scholar
  61. 61.
    G. Rossetti, R. Guidotti, I. Miliou, D. Pedreschi, F. Giannotti, A supervised approach for intra-/inter-community interaction prediction in dynamic social networks. Soc. Netw. Anal. Min. 6, 86 (2016)Google Scholar
  62. 62.
    G. Rossetti, L. Pappalardo, R. Kikas, D. Pedreschi, F. Giannotti, M. Dumas, Homophilic network decomposition: a community-centric analysis of online social services. Soc. Netw. Anal. Min. J. 6, 103 (2016)Google Scholar
  63. 63.
    G. Rossetti, L. Pappalardo, D. Pedreschi, F. Giannotti, Tiles: an online algorithm for community discovery in dynamic social networks, in Machine Learning (2016), pp. 1–29Google Scholar
  64. 64.
    S. Ruggieri, Using t-closeness anonymity to control for non-discrimination. Trans. Data Priv. 7(2), 99–129 (2014)MathSciNetGoogle Scholar
  65. 65.
    S. Ruggieri, F. Turini, A KDD process for discrimination discovery, in Proceedings of Machine Learning and Knowledge Discovery in Databases (ECML-PKDD 2016) Part III. LNCS, vol. 9853 (Springer, Berlin, 2016), pp. 249–253Google Scholar
  66. 66.
    S. Ruggieri, D. Pedreschi, F. Turini, Data mining for discrimination discovery. ACM Trans. Knowl. Discov. Data 4(2), Article 9 (2010)Google Scholar
  67. 67.
    S. Ruggieri, S. Hajian, F. Kamiran, X. Zhang, Anti-discrimination analysis using privacy attack strategies, in Proceedings of Machine Learning and Knowledge Discovery in Databases (ECML-PKDD) Part II. LNCS, vol. 8725 (2014), pp. 694–710Google Scholar
  68. 68.
    R. Trasarti, F. Pinelli, M. Nanni, F. Giannotti, Mining mobility user profiles for car pooling, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’11, ACM, New York, 2011), pp. 1190–1198Google Scholar
  69. 69.
    R. Trasarti, R. Guidotti, A. Monreale, F. Giannotti, Myway: location prediction via mobility profiling, in Information Systems (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • G. Amato
    • 1
  • L. Candela
    • 1
  • D. Castelli
    • 1
  • A. Esuli
    • 1
  • F. Falchi
    • 1
  • C. Gennaro
    • 1
  • F. Giannotti
    • 1
  • A. Monreale
    • 2
  • M. Nanni
    • 1
  • P. Pagano
    • 1
  • L. Pappalardo
    • 2
  • D. Pedreschi
    • 2
  • F. Pratesi
    • 1
  • F. Rabitti
    • 1
  • S. Rinzivillo
    • 1
  • G. Rossetti
    • 2
  • S. Ruggieri
    • 2
  • F. Sebastiani
    • 1
  • M. Tesconi
    • 3
  1. 1.ISTI - CNRPisaItaly
  2. 2.University of PisaPisaItaly
  3. 3.IIT-CNRPisaItaly

Personalised recommendations