Data Science and Distributed Intelligence: Recent Developments and Future Insights

Cuzzocrea, Alfredo; Gaber, Mohamed Medhat

doi:10.1007/978-3-642-32524-3_18

Alfredo Cuzzocrea^5,6 &
Mohamed Medhat Gaber^5,6

Part of the book series: Studies in Computational Intelligence ((SCI,volume 446))

898 Accesses
2 Citations
3 Altmetric

Abstract

Big Data, Data Science and MapReduce are three keywords that have flooded our research papers and technical articles during the last two years. Also, due to the inherent distributed nature of computational infrastructures supporting Data Science (like Clouds and Grids), it is natural to view Distributed Intelligence as the most natural underlying paradigm for novel Data Science challenges. Following this major trend, in this paper we provide a background of these new terms, followed by a discussion of recent developments in the data mining and data warehousing areas in the light of aforementioned keywords. Finally, we provide our insights of the next stages in research and developments in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)
Google Scholar
Agrawal, D., Das, S., Abbadi, A.E.: Big data and cloud computing: current state and future opportunities. In: EDBT, pp. 530–533 (2011)
Google Scholar
Apache. Hadoop (July 2011), http://wiki.apache.org/hadoop
BBC. Gap scraps new logo after online outcry (2010), http://www.bbc.co.uk/news/business-11520930
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS, pp. 281–288 (2006)
Google Scholar
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: Mad skills: New analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)
Google Scholar
Cordeiro, R.L.F., Traina Jr., C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: KDD, pp. 690–698 (2011)
Google Scholar
Cuzzocrea, A.: CAMS: OLAPing Multidimensional Data Streams Efficiently. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 48–62. Springer, Heidelberg (2009)
Chapter Google Scholar
Cuzzocrea, A.: Retrieving Accurate Estimates to OLAP Queries over Uncertain and Imprecise Multidimensional Data Streams. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 575–576. Springer, Heidelberg (2011)
Chapter Google Scholar
Cuzzocrea, A., Chakravarthy, S.: Event-based lossy compression for effective and efficient olap over data streams. Data Knowl. Eng. 69(7), 678–708 (2010)
Article Google Scholar
Cuzzocrea, A., Furfaro, F., Mazzeo, G.M., Saccá, D.: A Grid Framework for Approximate Aggregate Query Answering on Summarized Sensor Network Readings. In: Meersman, R., Tari, Z., Corsaro, A. (eds.) OTM-WS 2004. LNCS, vol. 3292, pp. 144–153. Springer, Heidelberg (2004)
Chapter Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dittrich, J., Quiané-Ruiz, J.-A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). PVLDB 3(1), 518–529 (2010)
Google Scholar
Ene, A., Im, S., Moseley, B.: Fast clustering using mapreduce. In: KDD, pp. 681–689 (2011)
Google Scholar
Foster, I.T., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. IJHPCA 15(3), 200–222 (2001)
Google Scholar
Gaber, M.M.: Data stream mining using granularity-based approach. In: Foundations of Computational Intelligence, vol. (6), pp. 47–66. Springer (2009)
Google Scholar
Ghoting, A., Kambadur, P., Pednault, E.P.D., Kannan, R.: Nimble: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce. In: KDD, pp. 334–342 (2011)
Google Scholar
Bártolo Gomes, J., Gaber, M.M., Sousa, P.A.C., Menasalvas, E.: Context-Aware Collaborative Data Stream Mining in Ubiquitous Devices. In: Gama, J., Bradley, E., Hollmén, J. (eds.) IDA 2011. LNCS, vol. 7014, pp. 22–33. Springer, Heidelberg (2011)
Chapter Google Scholar
Hacigümüs, H., Mehrotra, S., Iyer, B.R.: Providing database as a service. In: ICDE, pp. 29–38 (2002)
Google Scholar
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A self-tuning system for big data analytics. In: CIDR, pp. 261–272 (2011)
Google Scholar
Hill, K.: How target figured out a teen girl was pregnant before her father did. Forbes (2012)
Google Scholar
Lintott, C.J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D., Raddick, M.J., Nichol, R.C., Szalay, A., Andreescu, D., Murray, P., Vandenberg, J.: Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society 389(3), 1179–1189 (2008)
Article Google Scholar
Loukides, M.: What is data science? the future belongs to the companies and people that turn data into products. An OReilly Radar Report (June 2010)
Google Scholar
Muthukrishnan, S.: Data streams: algorithms and applications. Foundations and trends in theoretical computer science. Now Publishers (2005)
Google Scholar
Papadimitriou, S., Sun, J.: Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining. In: ICDM, pp. 512–521 (2008)
Google Scholar
Papadimitriou, S., Sun, J., Yan, R.: Large-scale data mining: Mapreduce and beyond. In: Tutorial in KDD 2010(July 2010)
Google Scholar
Soulellis, G.: Emerging trends in big data and analytics. Big Data Innovation, London (2012)
Google Scholar
Stonebraker, M., Hong, J.: Researchers’ big data crisis; understanding design and functionality. Commun. ACM 55(2), 10–11 (2012)
Article Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)
Google Scholar
Yin, J., Gaber, M.M.: Clustering distributed time series in sensor networks. In: ICDM, pp. 678–687 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

ICAR-CNR and University of Calabria, I-87036, Cosenza, Italy
Alfredo Cuzzocrea & Mohamed Medhat Gaber
School of Computing, University of Portsmouth, Portsmouth, PO1 3HE, Hampshire, UK
Alfredo Cuzzocrea & Mohamed Medhat Gaber

Authors

Alfredo Cuzzocrea
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Medhat Gaber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

, Department of Electronics, Informatics, University of Calabria, Via P. Bucci, cubo 41C, Rende (CS), 87036, Italy
Giancarlo Fortino
Faculty of Automatics, Computers and Ele, Software Engineering Department, University of Craiova, Bvd. Decebal 107, Craiova, 200440, Romania
Costin Badica
, Dipartimento di Ingegneria Elettrica,, Università di Catania, V.le Andrea Doria, 6, Catania, 95125, Italy
Michele Malgeri
, Institute for Computer Science and, University of Duisburg-Essen, Schuetzenbahn 70, Essen, 45117, Germany
Rainer Unland

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cuzzocrea, A., Gaber, M.M. (2013). Data Science and Distributed Intelligence: Recent Developments and Future Insights. In: Fortino, G., Badica, C., Malgeri, M., Unland, R. (eds) Intelligent Distributed Computing VI. Studies in Computational Intelligence, vol 446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32524-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-32524-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32523-6
Online ISBN: 978-3-642-32524-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics