Skip to main content

A Machine Learning Perspective on Big Data Analysis

  • Chapter
  • First Online:
Big Data Analysis: New Algorithms for a New Society

Part of the book series: Studies in Big Data ((SBD,volume 16))

Abstract

This chapter surveys the field of Big Data analysis from a machine learning perspective. In particular, it contrasts Big Data analysis with data mining, which is based on machine learning, reviews its achievements and discusses its impact on science and society. The chapter concludes with a summary of the book’s contributing chapters divided into problem-centric and domain-centric essays.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    While this application was originally considered a success, it subsequently obtained disappointing results and is now in the process of getting improved [4].

  2. 2.

    Please note that graphs were sometimes considered in traditional data mining (e.g., as structures of chemical compounds), but the graphs in question were of much smaller size than those considered today.

References

  1. Abiteboul, S.: Querying semi-structured data. In: ICDT ’97 Proceedings of the 6th International Conference on Database Theory, pp. 1–18 (1997)

    Google Scholar 

  2. An interview with Michal Jordan—Why Big Data Could Be a Big Fail. IEEE Spectrum. (Posted by Lee Gomes, 20 Oct 2014)

    Google Scholar 

  3. Anderson, C.: The end of Theory. The data deluge makes the scientific method obsolete, Wired Magazine, 16/07 (2008, June 23)

    Google Scholar 

  4. Auerbach, D.: The Mystery of the Exploding Tongue. How reliable is Google Flu Trends? Slate Web page. http://www.slate.com/articles/technology/bitwise/2014/03/google_flu_trends_reliability_a_new_study_questions_its_methods.html (2014)

  5. Azzara, M.: Big Data Ethics: Transparency, Privacy, and Identity. Blog cmo.com. (Retrieved 2015)

    Google Scholar 

  6. Barbaro, M., Zeller, Jr, T.: A Face Is Exposed for AOL Searcher No. 4417749. The New York Times Magazine. (August 9, 2006)

    Google Scholar 

  7. Barbier, G., Liu, H.: Data Mining in Social Media. In: Aggarwal, C. (eds.) Social Network Data Analytics, pp. 327–352. Kluwer Academic Publishers, Springer (2011)

    Google Scholar 

  8. Bekkerman, R., Bilenko, M., Langford, J.: Scaling Up Machine Learning. Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  9. Berkeley Data Analysis Stack. https://amplab.cs.berkeley.edu/software/

  10. Beyer, M.A., Laney, D.: The importance of "Big Data": a definition. Gartner Publications, pp. 1–9 (2012). See also: http://www.gartner-com/it-glosary/big-data

  11. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  12. Billion Price Project. http://bpp.mit.edu/

  13. Boyd, D., Crawford, K.: Six provocations for Big Data. Presented at "A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society" Oxford Internet Institute, Sept 21 (2011)

    Google Scholar 

  14. Boyd, D., Crawford, K.: Critical questions for big data. Inf. Commun. Soc. 15(5), 662–679 (2012)

    Article  Google Scholar 

  15. Che, D., Safran, M., Peng, Z.: From big data to big data mining: challenges, issues and opportunities. In: Hong, B, et al. (eds.) DASFAA Workshops, Springer LNCS 7827, pp. 1–15 (2013)

    Google Scholar 

  16. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mobile New Appl. 19, 171–209 (2014)

    Article  Google Scholar 

  17. Dai, C., Lin, D., Bertino, E., Kantarcioglu, M.: An approach to evaluate data trustworthiness based on data provenance. In: Proceedings of the 5th VLDB Workshop on Secure Data Management, pp. 82– 98 (2008)

    Google Scholar 

  18. Davidson, S., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the SIGMOD’08 (2008)

    Google Scholar 

  19. Davis, K.: Ethics of Big Data. Balancing Risk and Innovation. O’Reily (2012)

    Google Scholar 

  20. De Mauro, A., Greco, M., Grimaldi, M.: What is big data? a consensual definition and a review of key research topics. In: Proceedings of 4th Conference on Integrated Information (2014)

    Google Scholar 

  21. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)

    Google Scholar 

  22. Einav, L., Levin, J.D.: The data revolution and economic analysis. National Bureau of Economic Research Working Paper, no. 19035 (2013)

    Google Scholar 

  23. Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. SIGKDD Explor. Newsl. 12(2), 1–5 (2013)

    Google Scholar 

  24. Frontiers in Massive Data Analysis. The National Research Council, the National Academy of Sciences, USA (2013)

    Google Scholar 

  25. Future Attribute Screening Technology. Wikipedia article. https://en.wikipedia.org/wiki/Future_Attribute_Screening_Technology

  26. Gaber, M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM Sigmod Record 34(2), 18–26 (2005)

    Article  MATH  Google Scholar 

  27. Gama, J.: Knowledge Discovery from Data Streams, 1st ed. Hall/CRC, (2010)

    Google Scholar 

  28. Ghoting, A., Kambadur, P., Pednault, E., Kannan, R.: NIMBLE: A toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 2011, pp. 334–342 (2011)

    Google Scholar 

  29. Ginsberg, J., Mohebbi, M. H., Patel, Rajan S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457(7232), 1012–1014 (19 Feb 2009)

    Google Scholar 

  30. Glavic, B.: Big Data provenance: challenges and implications for benchmarking. In: Specifying Big Data Benchmarks, pp. 72–80. Springer (2014)

    Google Scholar 

  31. Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understanding individual human mobility patterns. Nature 453, 779–782 (2008)

    Article  Google Scholar 

  32. Hadoop. http://hadoop.apache.org

  33. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. San Francisco, Morgan Kaufmann (2005)

    MATH  Google Scholar 

  34. Harford, T.: Big Data: are we making a big mistakes? Financial Times, March 28 (2014)

    Google Scholar 

  35. Hashem, I., Yaqoob, I., Anuor, N., Mokhter, S., Gani, A., Khan, S.: The rise of bog data on cloud computing. Review and open research issues. Inf. Syst. 47, 98–115 (2015)

    Article  Google Scholar 

  36. How big data analysis helped increase Walmart’s sales turnover. DeZyre Web page (23 May 2015)

    Google Scholar 

  37. Kang, U., Faloutsos, C.: Big graph mining: algorithms and discoveries. ACM SIGKDD Explor. Newsl. 14(2), 29–36 (2012)

    Article  Google Scholar 

  38. Kraska, T., Talwalkar, A., Duchi, J.C., Griffith, R., Franklin, M.J., Jordan, M.I. MLbase: A distributed machine-learning system. In: Proceedings of Sixth Biennial Conference on Innovative Data Systems Research (2013)

    Google Scholar 

  39. Krempl, G., Zliobaite, I., Brzezinski, D., Hullermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. ACM SIGKDD Explor. 16(1), 1–10 (2014). June

    Article  Google Scholar 

  40. Mahout software. http://mahout.apache.org/

  41. Maimon, O., Rokach, L. (eds.): The Data Mining and Knowledge Discovery Handbook. Springer (2005)

    Google Scholar 

  42. Mannila, H.: Data mining: machine learning, statistics, and databases, In: Proceedings of the Eight International Conference on Scientific and Statistical Database Management. Stockholm June 18–20, pp. 1–8 (1996)

    Google Scholar 

  43. Manning C., Schutze H. Foundations of Statistical Natural Language Processing. MIT Press (1999)

    Google Scholar 

  44. Marcus, G., Davis, E.: Eight (No, Nine!) Problems With Big Data. New York Times (Apr 6, 2014)

    Google Scholar 

  45. Matwin, S.: Privacy-preserving data mining techniques: survey and challenges. In: Custers, B., Calders, T., Schermer, B., Zarsky T. (eds.) Discrimination and Privacy in the Information Society. Springer Series on Studies in Applied Philosophy, Epistemology and Rational Ethics, vol. 3, pp. 209–221 (2013)

    Google Scholar 

  46. Matwin, S.: Machine learning: four lessons and what is next? Bull. Pol. AI Soc. 2, 2–7 (2013)

    Google Scholar 

  47. Mayer-Schonberger, V., Cukier, K.: Big Data: A Revolution That Will Transform How We Live, Work and Think. Eamon, Dolan/Houghton Mifflin Harcourt (2013)

    Google Scholar 

  48. Morales, G., Bifet, A.: SAMOA: scalable advanced massive online analysis. J. Mach. Learn. Res. 16, 149–153 (2015)

    Google Scholar 

  49. Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset). In: Proceedings of the 2008 IEEE Symposium on Security and Privacy SP’08, pp. 111–125 (2008)

    Google Scholar 

  50. Piatetsky-Shapiro, G., Matheus, C. (eds): Knowledge discovery in databases. AAAI/MIT Press (1991)

    Google Scholar 

  51. Pietsch, W.: Big Data? The New Science of Complexity. In: 6th Munich-Sydney-Tilburg Conference on Models and Decisions (Munich; 10–12 April 2013)

    Google Scholar 

  52. Reinventing Society in the Wake of Big Data—Edge’s interview with Alex "Sandy" Pentland (Posted August 30, 2012)

    Google Scholar 

  53. Ritter, D.: When to act on a correlation and when no to. Harward Business Review, March 19 (2014)

    Google Scholar 

  54. Roddick, J., Hornsby, K., Spiliopoulou, M.: An updated bibliography of temporal, spatial, and spatio-temporal data mining research. Lect. Notes Comput. Sci. 2007, 147–163 (2001)

    Article  MATH  Google Scholar 

  55. Rudin, C., Passonneau, R., Radeva, A., Jerome, S., Issac, D.: 21st century data miners meet 19-th century electrical cables. IEEE Comput. 103–105 (June 2011)

    Google Scholar 

  56. Rudin, C., et al.: Machine learning for the New York city power grid. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 328–345 (2012)

    Article  Google Scholar 

  57. Shekhar, S.: What is special about mining spatial and spatio-temporal datasets? Tutorial (2014)b. http://www-users.cs.umn.edu/~shekhar/talk/sdm2.html

  58. Simmhan, Y., Plale, B., Gannon, D.: A survey on data provenance techniques. Technical Report Indiana University, IUB-CS-TR618 (2005)

    Google Scholar 

  59. Singh, D., Reddy, C.: A survey on platforms for Big Data analytics. J. Big Data 1(8), 2–20 (2014)

    Google Scholar 

  60. Sloan Digital Sky Survey. Wikipedia article. https://en.wikipedia.org/wiki/loan_Digital_Sky_Survey

  61. Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers (2012)

    Google Scholar 

  62. The h2o software. http://0xdata.com/h2o

  63. Thomson, C.: What Is IBMs Watson? The New York Times Magazine, June 16 (2010)

    Google Scholar 

  64. Tufekci, Z.: Big Data: Pitfalls, methods and concepts for an emergent field. SSRN (March 2013). http://dx.doi.org/10.2139/ssrn.2229952

  65. Venkateswara Rao, K., Govardhan, A., Chalapati, Rao K.V.: Spatiotemporal data mining: issues, tasks and applications. Int. J. Comput. Sci. & Eng. Surv. (IJCSES) 3(1) (Feb 2012)

    Google Scholar 

  66. Vucetic S., Obradovis, Z.: Discovering homogeneous regions in spatial data through competition. In: Proceedings of the 17th International Conference of Machine Learning ICML, pp. 1095–1102 (2000)

    Google Scholar 

  67. Zhou, Z.H., Chavla, N., Jin, Y., Williams, G.: Big Data opportunities and challenges: discussions from data analytics perspectives. IEEE Comput. Intell. Mag. 9(4), 62–74 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathalie Japkowicz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Japkowicz, N., Stefanowski, J. (2016). A Machine Learning Perspective on Big Data Analysis. In: Japkowicz, N., Stefanowski, J. (eds) Big Data Analysis: New Algorithms for a New Society. Studies in Big Data, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-26989-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26989-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26987-0

  • Online ISBN: 978-3-319-26989-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics