Skip to main content

Massive Data Analysis: Tasks, Tools, Applications, and Challenges

  • Chapter
  • First Online:
Big Data Analytics

Abstract

In this study, we provide an overview of the state-of-the-art technologies in programming, computing, and storage of the massive data analytics landscape. We shed light on different types of analytics that can be performed on massive data. For that, we first provide a detailed taxonomy on different analytic types along with examples of each type. Next, we highlight technology trends of massive data analytics that are available for corporations, government agencies, and researchers. In addition, we enumerate several instances of opportunities that exist for turning massive data into knowledge. We describe and position two distinct case studies of massive data analytics that are being investigated in our research group: recommendation systems in e-commerce applications; and link discovery to predict unknown association of medical concepts. Finally, we discuss the lessons we have learnt and open challenges faced by researchers and businesses in the field of massive data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://facebook.com.

  2. 2.

    https://instagram.com.

  3. 3.

    http://www.youtube.com.

  4. 4.

    http://www.walmart.com.

  5. 5.

    http://hpccsystems.com.

  6. 6.

    http://hadoop.apache.org.

  7. 7.

    http://aws.amazon.com/dynamodb/.

  8. 8.

    http://memcached.org/.

  9. 9.

    http://basho.com/riak/.

  10. 10.

    http://redis.io/.

  11. 11.

    http://cassandra.apache.org/.

  12. 12.

    http://hbase.apache.org/.

  13. 13.

    http://json.org.

  14. 14.

    http://www.w3.org/TR/2006/REC-xml11-20060816/.

  15. 15.

    http://couchbase.com/.

  16. 16.

    http://mangodb.org/.

  17. 17.

    http://neo4j.org/.

  18. 18.

    http://www.orientechnologies.com/orientdb/.

  19. 19.

    http://aws.amazon.com/dynamodb/.

  20. 20.

    http://cloud.google.com/bigquery/.

  21. 21.

    https://cloud.google.com/bigquery/pricing.

  22. 22.

    http://azure.microsoft.com/.

  23. 23.

    http://lucene.apache.org/.

  24. 24.

    http://www.ncbi.nlm.nih.gov/pubmed/.

References

  1. 5 must-have lessons from the 2014 holiday season. http://www.experian.com/blogs/marketing-forward/2015/01/14/five-lessons-from-the-2014-holiday-season/. Accessed 6 March 2015

  2. 6 uses of big data for online retailers. http://www.practicalecommerce.com/articles/3960-6-Uses-of-Big-Data-for-Online-Retailers. Accessed 28 Feb 2015

  3. Abbasi A, Albrecht C, Vance A, Hansen J (2012) Metafraud: a meta-learning framework for detecting financial fraud. MIS Q 36(4):1293–1327

    Google Scholar 

  4. Al Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. In: Aggarwal CC (ed) Social network data analytics, pp 243–275. Springer US

    Google Scholar 

  5. Alaçam O, Dalcı D (2009) A usability study of webmaps with eye tracking tool: The effects of iconic representation of information. In: Proceedings of the 13th international conference on human-computer interaction. Part I: new trends, pp 12–21. Springer

    Google Scholar 

  6. Amini Salehi M, Caldwell T, Fernandez A, Mickiewicz E, Redberg D, Rozier EWD, Zonouz S (2014) RESeED: regular expression search over encrypted data in the cloud. In: Proceedings of the 7th IEEE Cloud conference, Cloud ’14, pp 673–680

    Google Scholar 

  7. Assunção MD, Calheiros RN, Bianchi S, Netto MAS, Buyya R (2014) Big data computing and clouds: Trends and future directions. J Parallel Distrib Comput

    Google Scholar 

  8. Australian square kilometer array pathfinder radio telescope. http://www.atnf.csiro.au/projects/askap/index.html. Accessed 28 Feb 2015

  9. Big data and content analytics: measuring the ROI. http://www.aiim.org/Research-and-Publications/Research/Industry-Watch/Big-Data-2013. Accessed 28 Feb 2015

  10. Buckinx W, Verstraeten G, Van den Poel D (2007) Predicting customer loyalty using the internal transactional database. Expert Syst Appl 32(1):125–134

    Article  Google Scholar 

  11. Chen Y, Canny JF (2011) Recommending ephemeral items at web scale. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, pp 1013–1022. ACM

    Google Scholar 

  12. Dean J (2006) Experiences with MapReduce, an abstraction for large-scale computation. In: Proceedings of the 15th international conference on parallel architectures and compilation techniques, PACT ’06

    Google Scholar 

  13. Dean J, Ghemawat S (2008) MapReduce: Simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  14. Descriptive, predictive, prescriptive: transforming asset and facilities management with analytics (2013)

    Google Scholar 

  15. Enhancing exploration and production with big data in oil & gas. http://www-01.ibm.com/software/data/bigdata/industry-oil.html. Accessed 28 Feb 2015

  16. Facebook. http://newsroom.fb.com/company-info/. Accessed 14 March 2015

  17. Farris A (2012) How big data is changing the oil & gas industry. Analyt Mag

    Google Scholar 

  18. Gartner survey reveals that 73 percent of organizations have invested or plan to invest in big data in the next two years. http://www.gartner.com/newsroom/id/2848718. Accessed 28 Feb 2015

  19. Gartner taps predictive analytics as next big business intelligence trend. http://www.enterpriseappstoday.com/business-intelligence/gartner-taps-predictive-analytics-as-next-big-business-intelligence-trend.html. Accessed 28 Feb 2015

  20. Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: Proceedings of the 19th ACM symposium on operating systems principles, SOSP ’03, pp 29–43

    Google Scholar 

  21. Gudivada VN, Baeza-Yates R, Raghavan VV (2015) Big data: promises and problems. Computer 3:20–23

    Google Scholar 

  22. Gudivada VN, Rao D, Raghavan VV (2014) NoSQL systems for big data management. In: IEEE World congress on Services (SERVICES), 2014, pp 190–197. IEEE

    Google Scholar 

  23. Gudivada VN, Rao D, Raghavan VV (2014) Renaissance in data management systems: SQL, NoSQL, and NewSQL. IEEE Computer (in Press)

    Google Scholar 

  24. HPCC vs Hadoop. http://hpccsystems.com/Why-HPCC/HPCC-vs-Hadoop. Accessed 14 March 2015

  25. IBM netfinity predictive failure analysis. http://ps-2.kev009.com/pccbbs/pc_servers/pfaf.pdf. Accessed 14 March 2015

  26. Indrawan-Santiago M (2012) Database research: are we at a crossroad? reflection on NoSQL. In: 2012 15th International conference on network-based information systems (NBiS), pp 45–51

    Google Scholar 

  27. Instagram. https://instagram.com/press/. Accessed 28 Feb 2015

  28. Jayasimha K, Rajyashree M, Tolga K (2013) Large-scale recommendations in a dynamic marketplace. In Workshop on large scale recommendation systems at RecSys 13:

    Google Scholar 

  29. Jayasimha K, Rajyashree M, Tolga K (2015) Subjective similarity: personalizing alternative item recommendations. In: WWW workshop: Ad targeting at scale

    Google Scholar 

  30. Katukuri JR, Xie Y, Raghavan VV, Gupta A (2012) Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks. BMC Genom 13(Suppl 3):S5

    Google Scholar 

  31. Katukuri J, Konik ,T Mukherjee R, Kolay S (2014) Recommending similar items in large-scale online marketplaces. In: 2014 IEEE International conference on Big Data (Big Data), pp 868–876. IEEE

    Google Scholar 

  32. Ko SY, Hoque I, Cho B, Gupta I (2010) Making cloud intermediate data fault-tolerant. In: Proceedings of the 1st ACM symposium on cloud computing, SoCC ’10, pp 181–192

    Google Scholar 

  33. Lam C (2010) Hadoop in action, 1st edn. Manning Publications Co., Greenwich, CT, USA

    Google Scholar 

  34. Li W, Yang Y, Yuan D (2011) A novel cost-effective dynamic data replication strategy for reliability in cloud data centres. In: Proceedings of the Ninth IEEE international conference on dependable, autonomic and secure computing, DASC ’11, pp 496–502

    Google Scholar 

  35. Lohr S (2012) The age of big data. New York Times 11

    Google Scholar 

  36. Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. arxiv preprint. arXiv:1006.4990

  37. Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD ’10, pp 135–146

    Google Scholar 

  38. Manyika J, Michael C, Brad B, Jacques B, Richard D, Charles R (2011) Angela Hung Byers, and McKinsey Global Institute. The next frontier for innovation, competition, and productivity, Big data

    Google Scholar 

  39. Martin A, Knauth T, Creutz S, Becker D, Weigert S, Fetzer C, Brito A (2011) Low-overhead fault tolerance for high-throughput data processing systems. In: Proceedings of the 31st International conference on distributed computing systems, ICDCS ’11, pp 689–699

    Google Scholar 

  40. Middleton AM, Bayliss DA, Halliday G (2011) ECL/HPCC: A unified approach to big data. In: Furht B, Escalante A (eds) Handbook of data intensive computing, pp 59–107. Springer, New York

    Google Scholar 

  41. Middleton AM (2011) Lexisnexis, and risk solutions. White Paper HPCC systems: data intensive supercomputing solutions. Solutions

    Google Scholar 

  42. NASA applies text analytics to airline safety. http://data-informed.com/nasa-applies-text-analytics-to-airline-safety/. Accessed 28 Feb 2015

  43. New IDC worldwide big data technology and services forecast shows market expected to grow to $32.4 billion in 2017. http://www.idc.com/getdoc.jsp?containerId=prUS24542113. Accessed 28 Feb 2015

  44. Processing large-scale graph data: A guide to current technology. http://www.ibm.com/developerworks/library/os-giraph/. Accessed 08 Sept 2015

  45. Purdue university achieves remarkable results with big data. https://datafloq.com/read/purdue-university-achieves-remarkable-results-with/489. Accessed 28 Feb 2015

  46. Reinsel R, Minton S, Turner V, Gantz JF (2014) The digital universe of opportunities: rich data and increasing value of the internet of things

    Google Scholar 

  47. Resources:HPCC systems. http://hpccsystems.com/resources. Accessed 14 March 2015

  48. Russom P et al (2011) Big data analytics. TDWI Best Practices Report, Fourth Quarter

    Google Scholar 

  49. Salehi M, Buyya R (2010) Adapting market-oriented scheduling policies for cloud computing. In: Algorithms and architectures for parallel processing, vol 6081 of ICA3PP’ 10. Springer, Berlin, pp 351–362

    Google Scholar 

  50. Sato K (2012) An inside look at google bigquery. White paper. https://cloud.google.com/files/BigQueryTechnicalWP.pdf

  51. Shinnar A, Cunningham D, Saraswat V, Herta B (2012) M3r: Increased performance for in-memory hadoop jobs. Proc VLDB Endown 5(12):1736–1747

    Article  Google Scholar 

  52. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: Proceedings of the 26th IEEE symposium on mass storage systems and technologies, MSST ’10, pp 1–10

    Google Scholar 

  53. Singh VK, Gao M, Jain R (2012) Situation recognition: an evolving problem for heterogeneous dynamic big multimedia data. In: Proceedings of the 20th ACM international conference on multimedia, MM ’12, pp 1209–1218, New York, NY, USA, 2012. ACM

    Google Scholar 

  54. The future of big data? three use cases of prescriptive analytics. https://datafloq.com/read/future-big-data-use-cases-prescriptive-analytics/668. Accessed 02 March 2015

  55. The large Hadron collider. http://home.web.cern.ch/topics/large-hadron-collider. Accessed 28 Feb 2015

  56. The oil & gas industry looks to prescriptive analytics to improve exploration and production. https://www.exelisvis.com/Home/NewsUpdates/TabId/170/ArtMID/735/ArticleID/14254/The-Oil--Gas-Industry-Looks-to-Prescriptive-Analytics-To-Improve-Exploration-and-Production.aspx. Accessed 28 Feb 2015

  57. Troester M (2012) Big data meets big data analytics, p 13

    Google Scholar 

  58. Van den Poel D, Buckinx W (2005) Predicting online-purchasing behaviour. Eur J Oper Res 166(2):557–575

    Article  MathSciNet  MATH  Google Scholar 

  59. VC funding trends in big data (IDC report). http://www.experfy.com/blog/vc-funding-trends-big-data-idc-report/. Accessed 28 Feb 2015

  60. Wang J, Gong W, Varman P, Xie C (2012) Reducing storage overhead with small write bottleneck avoiding in cloud raid system. In: Proceedings of the 2012 ACM/IEEE 13th international conference on grid computing, GRID ’12, pp 174–183, Washington, DC, USA, 2012. IEEE Computer Society

    Google Scholar 

  61. Wolpin S (2006) An exploratory study of an intranet dashboard in a multi-state healthcare system

    Google Scholar 

  62. Youtube statistics. http://www.youtube.com/yt/press/statistics.html. Accessed 28 Feb 2015

  63. Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Fut Gen Comput Syst 26(8):1200–1214

    Article  Google Scholar 

  64. Yuan D, Cui L, Liu X (2014) Cloud data management for scientific workflows: research issues, methodologies, and state-of-the-art. In: 10th International conference on semantics, knowledge and grids (SKG), pp 21–28

    Google Scholar 

  65. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, NSDI’12, pp 2–12. USENIX Association

    Google Scholar 

  66. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: Cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on hot topics in cloud computing, HotCloud’10, pp 10–15

    Google Scholar 

  67. Zhang C, Chang EC, Yap RHC (2014) Tagged-MapReduce: a general framework for secure computing with mixed-sensitivity data on hybrid clouds. In: Proceedings of 14th IEEE/ACM international symposium on cluster, cloud and grid computing, pp 31–40

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vijay Raghavan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this chapter

Cite this chapter

Pusala, M.K., Amini Salehi, M., Katukuri, J.R., Xie, Y., Raghavan, V. (2016). Massive Data Analysis: Tasks, Tools, Applications, and Challenges. In: Pyne, S., Rao, B., Rao, S. (eds) Big Data Analytics. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3628-3_2

Download citation

Publish with us

Policies and ethics