Skip to main content

Data Science and Distributed Intelligence: Recent Developments and Future Insights

  • Conference paper
Book cover Intelligent Distributed Computing VI

Part of the book series: Studies in Computational Intelligence ((SCI,volume 446))

Abstract

Big Data, Data Science and MapReduce are three keywords that have flooded our research papers and technical articles during the last two years. Also, due to the inherent distributed nature of computational infrastructures supporting Data Science (like Clouds and Grids), it is natural to view Distributed Intelligence as the most natural underlying paradigm for novel Data Science challenges. Following this major trend, in this paper we provide a background of these new terms, followed by a discussion of recent developments in the data mining and data warehousing areas in the light of aforementioned keywords. Finally, we provide our insights of the next stages in research and developments in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)

    Google Scholar 

  2. Agrawal, D., Das, S., Abbadi, A.E.: Big data and cloud computing: current state and future opportunities. In: EDBT, pp. 530–533 (2011)

    Google Scholar 

  3. Apache. Hadoop (July 2011), http://wiki.apache.org/hadoop

  4. BBC. Gap scraps new logo after online outcry (2010), http://www.bbc.co.uk/news/business-11520930

  5. Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS, pp. 281–288 (2006)

    Google Scholar 

  6. Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: Mad skills: New analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)

    Google Scholar 

  7. Cordeiro, R.L.F., Traina Jr., C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with mapreduce. In: KDD, pp. 690–698 (2011)

    Google Scholar 

  8. Cuzzocrea, A.: CAMS: OLAPing Multidimensional Data Streams Efficiently. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 48–62. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Cuzzocrea, A.: Retrieving Accurate Estimates to OLAP Queries over Uncertain and Imprecise Multidimensional Data Streams. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 575–576. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Cuzzocrea, A., Chakravarthy, S.: Event-based lossy compression for effective and efficient olap over data streams. Data Knowl. Eng. 69(7), 678–708 (2010)

    Article  Google Scholar 

  11. Cuzzocrea, A., Furfaro, F., Mazzeo, G.M., Saccá, D.: A Grid Framework for Approximate Aggregate Query Answering on Summarized Sensor Network Readings. In: Meersman, R., Tari, Z., Corsaro, A. (eds.) OTM-WS 2004. LNCS, vol. 3292, pp. 144–153. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  13. Dittrich, J., Quiané-Ruiz, J.-A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). PVLDB 3(1), 518–529 (2010)

    Google Scholar 

  14. Ene, A., Im, S., Moseley, B.: Fast clustering using mapreduce. In: KDD, pp. 681–689 (2011)

    Google Scholar 

  15. Foster, I.T., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. IJHPCA 15(3), 200–222 (2001)

    Google Scholar 

  16. Gaber, M.M.: Data stream mining using granularity-based approach. In: Foundations of Computational Intelligence, vol. (6), pp. 47–66. Springer (2009)

    Google Scholar 

  17. Ghoting, A., Kambadur, P., Pednault, E.P.D., Kannan, R.: Nimble: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce. In: KDD, pp. 334–342 (2011)

    Google Scholar 

  18. Bártolo Gomes, J., Gaber, M.M., Sousa, P.A.C., Menasalvas, E.: Context-Aware Collaborative Data Stream Mining in Ubiquitous Devices. In: Gama, J., Bradley, E., Hollmén, J. (eds.) IDA 2011. LNCS, vol. 7014, pp. 22–33. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  19. Hacigümüs, H., Mehrotra, S., Iyer, B.R.: Providing database as a service. In: ICDE, pp. 29–38 (2002)

    Google Scholar 

  20. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A self-tuning system for big data analytics. In: CIDR, pp. 261–272 (2011)

    Google Scholar 

  21. Hill, K.: How target figured out a teen girl was pregnant before her father did. Forbes (2012)

    Google Scholar 

  22. Lintott, C.J., Schawinski, K., Slosar, A., Land, K., Bamford, S., Thomas, D., Raddick, M.J., Nichol, R.C., Szalay, A., Andreescu, D., Murray, P., Vandenberg, J.: Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society 389(3), 1179–1189 (2008)

    Article  Google Scholar 

  23. Loukides, M.: What is data science? the future belongs to the companies and people that turn data into products. An OReilly Radar Report (June 2010)

    Google Scholar 

  24. Muthukrishnan, S.: Data streams: algorithms and applications. Foundations and trends in theoretical computer science. Now Publishers (2005)

    Google Scholar 

  25. Papadimitriou, S., Sun, J.: Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining. In: ICDM, pp. 512–521 (2008)

    Google Scholar 

  26. Papadimitriou, S., Sun, J., Yan, R.: Large-scale data mining: Mapreduce and beyond. In: Tutorial in KDD 2010(July 2010)

    Google Scholar 

  27. Soulellis, G.: Emerging trends in big data and analytics. Big Data Innovation, London (2012)

    Google Scholar 

  28. Stonebraker, M., Hong, J.: Researchers’ big data crisis; understanding design and functionality. Commun. ACM 55(2), 10–11 (2012)

    Article  Google Scholar 

  29. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)

    Google Scholar 

  30. Yin, J., Gaber, M.M.: Clustering distributed time series in sensor networks. In: ICDM, pp. 678–687 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cuzzocrea, A., Gaber, M.M. (2013). Data Science and Distributed Intelligence: Recent Developments and Future Insights. In: Fortino, G., Badica, C., Malgeri, M., Unland, R. (eds) Intelligent Distributed Computing VI. Studies in Computational Intelligence, vol 446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32524-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32524-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32523-6

  • Online ISBN: 978-3-642-32524-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics