Skip to main content

Big Data Tools: Haddop, MongoDB and Weka

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2016)

Abstract

Big Data is a term that describes the exponential growth of all sorts of data–structured and non-structured– from different sources (data bases, social networks, the web, etc.) and which, as per their use, may become a benefit or an advantage for a company. This paper shows the current importance of Big Data, together with some of the algorithms that may be used with the purpose of reveling patterns, trends and data associations that may generate valuable information in real time, mentioning characteristics and applications of some of the tools currently used for data analysis so they may help to establish which is the most suitable technology to be implemented according to the needs or information required.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schroeck, M., Shockley, R., Smart, J., Morales, R., Tufano, P.: Analytics: the real-world use of big data. IBM Global Business Services, Saïd Business School, University of Oxford, pp. 1–20 (2012)

    Google Scholar 

  2. Boyd, D., Crawford, K.: Critical questions for big data. Inf. Commun. Soc. 15(5), 662–679 (2012)

    Article  Google Scholar 

  3. Katal, A., Wazid, M., Goudar, R.: Big data: issues, challenges, tools and good practices. In: 2013 Sixth International Conference on Contemporary Computing, pp. 404–409 (2013)

    Google Scholar 

  4. Chen, H., Chiang, R., Storey, V.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)

    Google Scholar 

  5. Jagadish, H., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)

    Article  Google Scholar 

  6. Purcell, B.: The emergence of ‘big data’ technology and analytics. J. Technol. Res. 4, 1–7 (2013)

    MathSciNet  Google Scholar 

  7. Coronel, C., Morris, S., Rob, P.: Database Systems: Design, Implementation, and Management (2009)

    Google Scholar 

  8. Wu, X., Zhu, X., Wu, G., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)

    Article  Google Scholar 

  9. Demchenko, Y., De Laat, C., Membrey, P.: Defining architecture components of the big data ecosystem. In: 2014 International Conference on Collaboration Technologies and Systems, CTS 2014, pp. 104–112 (2014)

    Google Scholar 

  10. McKinsey & Company: Big data: The next frontier for innovation, competition, and productivity. McKinsey Glob. Inst., p. 156, June 2011

    Google Scholar 

  11. Desouza, K., Smith, K.: Big data for social innovation. Stanford Soc. Innov. Rev. 12(3), 38–43 (2014)

    Google Scholar 

  12. Tsai, C., Lai, C., Chao, H., Vasilakos, A.: Big data analytics: a survey. J. Big Data 2(1), 21 (2015)

    Article  Google Scholar 

  13. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 19(2), 171–209 (2014)

    Article  MathSciNet  Google Scholar 

  14. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 37–54 (1996)

    Google Scholar 

  15. Bartere, M., Yenkar, V.: Review on data mining with big data. Int. J. Comput. Sci. Mob. Comput. 3(4), 97–102 (2014)

    Google Scholar 

  16. Menandas, J., Joshi, J.: Data mining with parallel processing technique for complexity reduction and characterization of big data. Glob. J. Advanced Research 1(1), 69–80 (2014)

    Google Scholar 

  17. Jain, K., Murty, M., Flynn, P.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  18. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010 (2010)

    Google Scholar 

  19. Borthakur, D.: The hadoop distributed file system: Architecture and design. Hadoop Project Website, pp. 1–14 (2007)

    Google Scholar 

  20. Dittrich, J., Quian, J.: Efficient big data processing in hadoop mapreduce. In: Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2014–2015 (2012)

    Google Scholar 

  21. MongoDB Inc 2008–2016. https://docs.mongodb.org/manual/introduction/

  22. Boicea, A., Radulescu, F., Agapin, L.: MongoDB vs Oracle - database comparison. In: Proceedings of 3rd International Conference on Emerging. Intelligent Data and Web Technologies, EIDWT 2012, September 2012, pp. 330–335 (2012)

    Google Scholar 

  23. Gyorodi, C., Gyorodi, R., Pecherle, G., Olah, A.: A comparative study: MongoDB vs. MySQL. In: 13th International Conference on Engineering Modern Electric System (2015)

    Google Scholar 

  24. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software. ACM SIGKDD Explor. Newsl. 11(1), 10 (2009)

    Article  Google Scholar 

  25. Garner, S.: WEKA: the waikato environment for knowledge analysis. In: Proceedings of New Zealand Computer Science, pp. 57–64 (1995)

    Google Scholar 

  26. Bouckaert, R., Frank, E., Hall, M., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: WEKA—experiences with a java open-source project. J. Mach. Learn. Res. 11, 2533–2541 (2010)

    MATH  Google Scholar 

  27. Witten, I., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.: Weka: practical machine learning tools and techniques with java implementations. Seminar 99, 192–196 (1999)

    Google Scholar 

  28. Oracle. Blogs.Oracle.Com

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paula Catalina Jaraba Navas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Jaraba Navas, P.C., Guacaneme Parra, Y.C., Rodríguez Molano, J.I. (2016). Big Data Tools: Haddop, MongoDB and Weka. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2016. Lecture Notes in Computer Science(), vol 9714. Springer, Cham. https://doi.org/10.1007/978-3-319-40973-3_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40973-3_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40972-6

  • Online ISBN: 978-3-319-40973-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics