Skip to main content

A Big Data Primer

  • Chapter
  • First Online:
Big Data-Enabled Nursing

Part of the book series: Health Informatics ((HI))

Abstract

The aim of this chapter is to describe the history of big data and its characteristics—variety, velocity, and volume—and to serve as a big data primer. Many organizations are using big data to improve their operations and/or create new products and services. Methods for generating data, how data is sensed, and then stored, in other words data collection, will be described. Mobile and internet technologies have transformed data collection for these companies and new sources are emerging at an unheard of speed. Due to the explosion of data, the teams needed to manage the data have evolved to include data scientists, domain experts, computer scientists, visualization experts, and more. The ideas of intellectual property are also changing. Who owns the data, the products generated from the data, and applications of the data? Challenges and tools for data analytics and data visualization of big data will be described, thus, setting the foundation for the rest of the book.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Judith J. Warren Ph.D., R.N.,F.A.A.N.,F.A.C.M.I. .

Editor information

Editors and Affiliations

Case Study 3.1: Big Data Resources—A Learning Module

Case Study 3.1: Big Data Resources—A Learning Module

Abstract

This case study is a compilation of resources for a learner to explore to gain beginning knowledge and skill in big data, data science, and data visualization. The resources focus on acquiring knowledge through books, white papers, videos, conferences, and online learning opportunities. There are also resources for learning about the hardware and software needed to engage in big data.

Keywords

Big dataData scienceData visualizationData wranglingHadoop/mapreduceData analyticsData scientistData science teamsVolume/variety/velocity of data setsData products

1.1 3.1.1 Introduction

The volume, variety, and velocity of big data exceed the volume of datasets common in health care research and operations. New technologies created to manage and analyze big data are being developed and tested at a rapid rate. This life cycle process is happening so fast that it is difficult to learn the technology and approaches much less keep up on the latest innovations. The phases of this life cycle are development, testing, discarding, testing, adopting, combining, using and discarding/reworking. These phases transpire in swift iterative cycles, and data scientists who utilize the tools work with a toolbox composed of well-developed software to niche software designed for specific uses, many of which are open source.

Today we are overwhelmed with an unprecedented amount of information and data. Big data comes from all kinds of sources: global positioning devices (GPS), loyalty shopping cards, online searches and selections, genomic information, traffic and weather information, health data from all sorts of personal devices (person generated health data), as well as data created from healthcare during inpatient and outpatient visits. Data is collected every second of every day. These types of data, including unstructured raw data, have been used in other industries to understand their business and create new products. Healthcare has been slower to adopt the use of big data in this way. The 2013 report by McKinsey Global Institute proposes that the effective use of big data in healthcare could create large value for the healthcare industry, over $300 billion every year (Kayyali B, Knott D, Van Kuiken, S. The big-data revolution in US health care: Accelerating value and innovation. Mc Kinsey & Company. 2013. Accessed at http://www.mckinsey.com/industries/healthcare-systems-and-services/our-insights/the-big-data-revolution-in-us-health-care).

The effective use of big data requires a data science approach to find and analyze subsets of data that administrators, clinicians, and researchers will find usable. Unlike a relational database where writing a query is fairly straight forward, gathering data from multiple data stores/warehouses of big data is much more complex. The ability to manage an incoming data stream of extraordinary volume, velocity and variety of data requires the expertise of a team. This case study provides a beginning resource for learning about big data, data science, and data visualization.

1.2 3.1.2 Resources for Big Data

As big data has caught the imagination of corporations and health care, the resources have exploded and most are readily available on the Internet. The following resources have been selected for learners who are just beginning their exploration of big data and a few that will stretch their knowledge towards competence. As you do your own searches, you will find many more. This listing will get you into the field and Internet space to find more resources that fit your learning style.

1.2.1 3.1.2.1 Big Data Conferences

Conferences are good places to explore a new field or gain more understanding of a field with which you have expertise. Networking is key at these events and can link you to others for future project work. These are just the tip of the iceberg of conferences, so enjoy looking for new ones near you.

  1. 1.

    In 2013, the University of Minnesota School of Nursing convened the first conference called Nursing Knowledge: Big Data Science, http://www.nursing.umn.edu/icnp/center-projects/big-data/index.htm. The first conference was invitational and explored the potential of big data for the improvement of patient outcomes as the result of nursing care. The conference was so successful, it has been held annually and been open to all registrants. Nursing Knowledge: Big Data Science is a working conference with many workgroups creating projects that are making an impact in Nursing research, education, and practice.

  2. 2.

    “Big Data 2 Knowledge” hosted by the National Institutes of Health (NIH) also has conferences, training sessions, and webinars. These events are geared towards creating a research cohort that is expert in big data and data analytics.

  3. 3.

    The Strata + Hadoop World Big Data conference is a meeting where business decision makers, strategists, architects, developers, and analysts gather to discuss big data and data science. At the conference you explore big data and hear what is emerging in the industry (http://conferences.oreilly.com/strata/hadoop-big-data-ca). O’Reilly Media and others put on this conference and afterwards post all the presentations to their web site. So even if you can’t attend, you can hear about cutting-edge big data.

1.2.2 3.1.2.2 Big Data Books and Articles

A tried and traditional way to learn about any knowledge is through books, journal articles, and white papers. The following are basic references to get you started in the big data initiative.

  1. 1.

    Anderson C. Creating a Data-Driven Organization. Sebastopol, CA: O’Reilly Media; 2015. http://shop.oreilly.com/product/0636920035848.do

  2. 2.

    Betts R, Hugg, J. Fast Data: Smart and at Scale. Sebastopol, CA: O’Reilly Media; 2015. https://voltdb.com/blog/introducing-fast-data-smart-and-scale-voltdbs-new-recipes-ebook

  3. 3.

    Brennan PF, Bakken S. Nursing needs big data and big data needs nursing. Journal of Nursing Scholarship, 2015;47: 477–484.

  4. 4.

    Chartier, T. Big Data: How Data Analytics Is Transforming the World. Chantilly, VA: The Great Courses. 2014. (includes video lectures). (http://www.thegreatcourses.com/courses/big-data-how-data-analytics-is-transforming-the-world.html)

  5. 5.

    Davenport T, Dyche J. Big data in big companies. 2013. http://www.sas.com/reg/gen/corp/2266746. Accessed 15 Dec 2015.

  6. 6.

    Mayer-Schonberger V, Cukier K. Big Data: A revolution that will transform how we live, work, and think. New York: Hought Mifflin Harcourt Publishing. 2013.

  7. 7.

    O’Reilly Radar Team. Planning for big data: A CIO’s handbook to the changing data landscape. Sebastopol, CA: O’Reilly Media. 2012. http://www.oreilly.com/data/free/planning-for-big-data.csp

  8. 8.

    O’Reilly Team. Big data now. Sebastopol, CA: O’Reilly Media. 2012.

  9. 9.

    Patil DJ, Mason H. Data driven: Creating a data culture. Sebastopol, CA: O’Reilly Media. 2015. http://datasciencereport.com/2015/07/31/free-ebook-data-driven-creating-a-data-culture-by-chief-data-scientists-dj-patil-hilary-mason/#.Vp6x33n2bL8

1.2.3 3.1.2.3 Big Data Videos

For those who need to see and hear, videos are great. Below are some from YouTube and other websites. Don’t forget to look at Tim Chartier’s work listed in the Books section. Great Courses combine expert faculty, a book, and video lectures. Explore TED Talks for more information about Big Data.

  1. 1.

    Big Data Tutorials and TED Talks, http://www.analyticsvidhya.com/blog/2015/07/big-data-analytics-youtube-ted-resources/

  2. 2.

    Kenneth Cukier: Big data is better data, https://www.youtube.com/watch?v=8pHzROP1D-w

  3. 3.

    The Secret Life of Big Data | Intel, https://www.youtube.com/watch?v=CNoi-XqwJnA (a good overview of the history of Big Data, a must watch)

  4. 4.

    What is Big Data? https://www.youtube.com/watch?v=c4BwefH5Ve8

  5. 5.

    What is BIG DATA? BIG DATA Tutorial for Beginners, https://www.youtube.com/watch?v=2NLyIqU-xwg

  6. 6.

    What Is Apache Hadoop? http://hadoop.apache.org/

  7. 7.

    What is Big Data and Hadoop? https://www.youtube.com/watch?v=FHVuRxJpiwI

  8. 8.

    What Does The Internet of Things Mean? https://www.youtube.com/watch?v=Q3ur8wzzhBU

  9. 9.

    MapReduce, https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

1.2.4 3.1.2.4 Big Data Web Sites

Many companies, with a web site, provide information, free books, white papers, tutorials, and free trial software. These sites are a rich resource. Gartner and Forrester are companies that evaluate and rate emerging companies and products in the big data industry.

  1. 1.

    IBM Big Data and Analytics platform, now known as IBM Watson Foundations, http://www.ibmbigdataanalytics.com

  2. 2.

    Forrester Wave, https://www.forrester.com/The+Forrester+Wave+Big+Data+Hadoop+Distributions+Q1+2016/fulltext/-/E-res121574#AST1022630

  3. 3.

    Gartner, http://www.gartner.com/technology/research/methodologies/research_mq.jsp

    1. (a)

      Magic Quadrant

    2. (b)

      HypeCycle

    3. (c)

      Critical Capabilities

  4. 4.

    Intel Processors, http://www.intel.com/content/www/us/en/homepage.html

    1. (a)

      The Butterfly Dress, https://www.youtube.com/watch?v=6ELuq3CzJys (a bit of fun with data and technology)

    2. (b)

      50th Anniversary of Moore’s Law, http://newsroom.intel.com/docs/DOC-6429 (if you are in informatics, you must know about Moore’s Law)

    3. (c)

      How Intel Gave Stephen Hawking his Voice, http://www.wired.com/2015/01/intel-gave-stephen-hawking-voice; https://www.youtube.com/watch?v=JA0AZUj2lOs

  5. 5.

    Kaggle, www.kaggle.com

  6. 6.

    O’Reilly Media, https://www.oreilly.com/topics/data

  7. 7.

    SAS, http://www.sas.com/en_us/insights/big-data.html

  8. 8.

    VoltDB, https://voltdb.com

  9. 9.

    Yuhanna, N. (August 3, 2015). The Forrester Wave: In-Memory Database Platforms, Q3 2015. http://go.sap.com/docs/download/2015/08/4481ad9e-3a7c-0010-82c7-eda71af511fa.pdf

  10. 10.

    Zaloni. http://www.zaloni.com/health-and-life-sciences

1.3 3.1.3 Resources for Data Science

Data science is composed of data wrangling and data analysis. Data wrangling is the process of cleaning and mapping data from one “raw” form into another format. Then algorithms can be applied to make sense of big data. The following resources have been selected for learners who are just beginning their exploration of data science and a few that will stretch their knowledge towards competence. As you do your own searches, you will find many more. This listing will get you into the field and Internet space to find more resources that fit your learning style.

1.3.1 3.1.3.1 Data Science Conferences

Conferences are good places to explore a new field or gain more understanding of a field with which you have expertise. Networking is key at these events and can link you to others for future project work.

  1. 1.

    “Big Data 2 Knowledge” hosted by the National Institutes of Health (NIH) also has conference, training sessions, and webinars. These events are geared towards creating a research cohort that is expert in Big Data and Data Analytics.

  2. 2.

    The Data Science Conference, http://www.thedatascienceconference.com.

1.3.2 3.1.3.2 Data Science Books and Articles

A tried and traditional way to learn about any knowledge is through books, journal articles, and white papers. The following are basic references to get you started in Data Science.

  1. 1.

    Ghavami, PK. Clinical Intelligence: The big data analytics revolution in healthcare: A framework for clinical and business intelligence. CreateSpace Independent Publishing Platform. 2014.

  2. 2.

    Grus, J. Data science from scratch: First principles with python. Sebastopol, CA: O’Reilly Media. 2015.

  3. 3.

    Gualtieri M, Curran R. The Forrester Wave: Big data predictive analytics solutions, Q2, 2015. April 1, 2015. https://www.sas.com/content/dam/SAS/en_us/doc/analystreport/forrester-wave-predictive-analytics-106811.pdf

  4. 4.

    Janert, PK. Data analysis with open source tools: A hands-on guide for programmers and data scientists. Sebastopol, CA: O’Reilly Media. 2010.

  5. 5.

    Loukides, M. What is data science? Sebastopol, CA: O’Reilly Media. 2012.

  6. 6.

    Marconi K, Lehmann H. Big data and health analytics. Boca Raton, FL: CRC Press. 2015.

  7. 7.

    O’Neil C, Schutt R. Doing data science: Straight talk from the frontline. Sebastopol, CA: O’Reilly Media. 2015.

  8. 8.

    Optum. Getting from big data to good data: Creating a foundation for actionable analytics. 2015. https://www.optum.com/content/dam/optum/CMOSpark%20Hub%20Resources/White%20Papers/OPT_WhitePaper_ClinicalAnalytics_ONLINE_031414.pdf

  9. 9.

    Patil DJ. Building data science teams: The skills, tools, and perspectives behind great data science groups. Sebastopol, CA: O’Reilly Media. 2011.

  10. 10.

    Provost F, Fawcett T. Data science for business: What you need to know about data mining and data-analytic thinking. Sebastopol, CA: O’Reilly Media. 2013.

  11. 11.

    Rattenbury T, Hellerstein JM, Heer J, Kandel S. Data wrangling: Techniques and concepts for agile analysts. Sebastopol, CA: O’Reilly Media. 2015.

  12. 12.

    Tailor K. The patient revolution: How big data and analytics are transforming the health care experience. Hoboken, NJ: John Wiley and Sons. 2016.

  13. 13.

    Trifacta. Six Core Data Wrangling Activities. 2015. https://www.trifacta.com/wp-content/uploads/2015/11/six-core-data-wrangling-activities-ebook.pdf. Accessed 15 Jan 2016.

1.3.3 3.1.3.3 Data Science Videos

For those who need to see and hear, videos are great. Below are some from YouTube and other websites. Explore TED Talks for more information about Big Data.

  1. 1.

    Analytics 2013—Keynote—Jim Goodnight, SAS, https://www.youtube.com/watch?v=AEI0fBQYJ1c

  2. 2.

    Big Data Analytics: The Revolution Has Just Begun, https://www.youtube.com/watch?v=ceeiUAmbfZk

  3. 3.

    Building Data Science Teams, https://www.youtube.com/watch?v=98NrsLE6ot4

  4. 4.

    Deep Learning: Intelligence from Big Data, https://www.youtube.com/watch?v=czLI3oLDe8M

  5. 5.

    The Future of Data Science—Data Science @ Stanford, https://www.youtube.com/watch?v=hxXIJnjC_HI

  6. 6.

    The Patient Revolution: How Big Data and Analytics Are Transforming the Health Care Experience, https://www.youtube.com/watch?v=oDztVSDUbxo

1.3.4 3.1.3.4 Data Science Web Sites

Many companies, with a web site, provide information, free books, white papers, tutorials, and free trial software. These sites are a rich resource.

  1. 1.

    Alteryx, http://www.alteryx.com.

  2. 2.

    Data Science at NIH, https://datascience.nih.gov/bd2k

  3. 3.

    IBM, http://www.ibmbigdataanalytics.com.

  4. 4.

    Kaggle-the Home of Data Science, https://www.kaggle.com

  5. 5.

    Python Programming Language, https://www.python.org/

  6. 6.

    R Programming language, https://www.r-project.org/about.html

  7. 7.

    SAS, https://www.sas.com/en_us/home.html.

  8. 8.

    Trifacta, https://www.trifacta.com/support.

1.4 3.1.4 Resources for Data Visualization

Data visualization is the third part of big data. Humans can absorb more data when it is depicted in images or graphs. The following resources have been selected for learners who are just beginning their exploration of data visualization and a few that will stretch their knowledge towards competence. As you do your own searches, you will find many more. This listing will get you into the field and Internet space to find more resources that fit your learning style.

1.4.1 3.1.4.1 Data Visualization Conferences

Conferences are good places to explore a new field or gain more understanding of a field with which you have expertise. Networking is key at these events and can link you to others for future project work. Most conferences on big data and data science include presentations on data visualization.

1.4.2 3.1.4.2 Data Visualization Books and Articles

A tried and traditional way to learn about any knowledge is through books, journal articles, and white papers. The following are basic references to get you started in the data visualization.

  1. 1.

    Beegel J. Infographics for dummies. Hoboken, NJ: John Wiley & Sons. 2014.

  2. 2.

    Few S. Now you see it: Simple visualization techniques for quantitative analysis. Oakland, CA: Analytics Press. 2009.

  3. 3.

    Harris RL. Information graphics: A comprehensive reference. Atlanta, GA: Management Graphics. 1996.

  4. 4.

    Jones B. Communicating data with tableau: Designing, developing, and delivering data visualization. Sebastopol, CA: O’Reilly Media. 2014. http://cdn.oreillystatic.com/oreilly/booksamplers/9781449372026_sampler.pdf

  5. 5.

    Knaflic CN. Storytelling with data: A data visualization guide for business professionals. Hoboken, NJ: John Wiley & Sons. 2015.

  6. 6.

    Tufte ER. Envisioning information. Cheshire, CN: Graphics Press. 1990.

  7. 7.

    Tufte ER. The visual display of quantitative information. Cheshire, CN: Graphics Press. 1983. (This is the classic text in visualization.)

  8. 8.

    Tufte ER. Visual explanations: Images and quantities, evidence and narrative. Cheshire, CN: Graphics Press. 1997.

  9. 9.

    Yau N. Data points: Visualization that means something. Indianapolis, IN: John Wiley & Sons; 2013.

  10. 10.

    Yau N. Visualize this: The FlowingData guide to design, visualization, and statistics. Indianapolis, IN: John Wiley & Sons; 2011.

1.4.3 3.1.4.3 Data Visualization Videos

For those who need to see and hear, videos are great. Below are some from YouTube and other websites. Explore TED Talks for more information about Big Data.

  1. 1.

    The beauty of data visualization, https://www.youtube.com/watch?v=5Zg-C8AAIGg

  2. 2.

    The best stats you’ve ever seen, https://www.youtube.com/watch?v=usdJgEwMinM

  3. 3.

    Designing Data Visualizations, https://www.youtube.com/watch?v=lTAeMU2XI4U

  4. 4.

    The Future of Data Visualization, https://www.youtube.com/watch?v=vc1bq0qIKoA

  5. 5.

    Introduction to Data Visualization, https://www.youtube.com/watch?v=XIgjTuDGXYY

1.4.4 3.1.4.4 Data Visualization Web Sites

Many companies, with a web site, provide information, free books, white papers, tutorials, and free software.

  1. 1.

    FlowingData, https://flowingdata.com.

  2. 2.

    SAS, http://www.sas.com/en_us/home.html

    1. (a)

      Data visualization and why it is important, http://www.sas.com/en_us/insights/big-data/data-visualization.html

  3. 3.

    Tableau, http://www.tableau.com/

    1. (a)

      Tableau. (2015). The 5 Most Influential Data Visualizations of All Time. http://www.tableau.com/top-5-most-influential-data-visualizations (note Florence Nightingale is the number two graph)

    2. (b)

      Visual Analysis Best Practices: Simple Techniques for Making Every Data Visualization Useful and Beautiful, http://get.tableau.com/asset/10-tips-to-create-useful-beautiful-visualizations.html

  4. 4.

    Trifacta, https://www.trifacta.com

1.5 3.1.5 Organizations of Interest

As the field of big data, data science and data visualization evolve, professional organizations will be formed. Listservs and blogs will be created. Academia will offer courses and degree programs. Certification and accreditation organizations will help to establish quality programs and individual performance. The following are just a sampling of what exists.

1.5.1 3.1.5.1 Professional Associations

Professionals will form professional organizations as they define their discipline. The organizations provide a forum for discussing practice, competencies, education, and the future.

  1. 1.

    American Statistics Association, http://www.amstat.org/

  2. 2.

    American Association of Big Data Professionals, https://aabdp.org/

    1. (a)

      Offers certification in various Big Data roles, https://aabdp.org/certifications.html

  3. 3.

    Data Science Association, http://www.datascienceassn.org/

  4. 4.

    Digital Analytics Association, http://www.digitalanalyticsassociation.org/

1.5.2 3.1.5.2 Listservs: A Sampling

Most web sites, organizations, industry, and publishers have listservs. This is a very efficient way to keep up with what is happening in these areas. The listserv is pushed to your email and enables you to see the latest thoughts, conferences, books, and software an industry that is evolving rapidly.

  1. 1.

    10 Data Science Newsletters To Subscribe To, https://datascience.berkeley.edu/10-data-science-newsletters-subscribe

  2. 2.

    Information Management, http://www.information-management.com/news/big-data-analytics/Big-Data-Scientist-Careers-10026908-1.html

  3. 3.

    O’Reilly Data Newsletter, http://www.oreilly.com/data/newsletter.html. Sign up to get the latest information about Big Data, Data Analytics, Data Visualization, and Conferences.

1.5.3 3.1.5.3 Certificates and Training: A Sampling

As jobs in these fields become more widely available, the demand for these skills will grow. Online education and formal degrees will become important for employers to consider. Certification may make a difference for employment.

  1. 1.

    Data Science at Coursera, https://www.coursera.org/specializations/jhu-data-science

  2. 2.

    Data at Coursera, https://www.coursera.org/specializations/big-dataQ

  3. 3.

    SAS Certification program, http://support.sas.com/certify/index.html

  4. 4.

    MIT Professional Education, https://mitprofessionalx.mit.edu/about

  5. 5.

    R Programming, https://www.coursera.org/learn/r-programming

1.5.4 3.1.5.4 Degree Programs: A Sampling

Degree programs are proliferating as the demand for big data professionals and data scientists increases. It will be important to select well before investing time and money into the programs. Always look for programs that are accredited. The University/College must be accredited by the US Department of Education. Even the department/school they reside in must be accredited by the appropriate accreditor. Accreditation assures the quality of the education.

  1. 1.

    23 Great Schools with Master’s Programs in Data Science, http://www.mastersindatascience.org/schools/23-great-schools-with-masters-programs-in-data-science

  2. 2.

    Carnegie Mellon University, http://www.cmu.edu/graduate/data-science/

  3. 3.

    Harvard, http://online-learning.harvard.edu/course/big-data-analytics

  4. 4.

    List of Graduate Programs in Big Data & Data Science, http://www.amstat.org/education/bigdata.cfm

  5. 5.

    Map of University Programs in Big Data Analytics, http://data-informed.com/bigdata_university_map/

  6. 6.

    Northwestern Kellogg School of Management, http://www.kellogg.northwestern.edu/execed/programs/bigdata.aspx?gclid=CLTa_Jf5u8oCFYVFaQodCpwHag

1.6 3.1.6 Assessment of Competencies

Teachers and students have used Bloom’s Taxonomy to create objectives that specify what is to be learned. The levels of Bloom can also be used to guide evaluation of the attainment of these objectives by the student. In 2002, Bloom’s was revised to reflect cognitive processes as well as knowledge attainment (http://www.unco.edu/cetl/sir/stating_outcome/documents/Krathwohl.pdf). The new taxonomic hierarchy is as follows (Krathwohl, 2002, p215):

  1. 1.

    “Remember—retrieving relevant knowledge from long-term memory

  2. 2.

    Understand—determining the meaning of information

  3. 3.

    Apply—using a procedure in a given situation

  4. 4.

    Analyze—breaking material into its constituent parts and detecting the relationships between the parts and the whole

  5. 5.

    Evaluate—making judgements based on criteria

  6. 6.

    Create—putting elements together to form a coherent whole or make a product.”

For big data and data science assignments the graduate student should be able to master the levels of “remember, understand, and apply” by engaging with the above resources. Objective assessments, in the form of tests, can then be used to determine mastery. Performance assessments are used to evaluate the achievement of the higher levels of Bloom-- analyze, evaluate and create. Performance assessments are conducted by experts and faculty through the use of case studies, simulations, projects, presentations, or portfolios.

1.7 3.1.7 Learning Activities

The following are several learning activities designed to help you apply the knowledge and skills learned from the above resources. The Bloom level for each activity is listed.

  1. 1.

    Conduct a web search on HADOOP and data warehouses. What did you learn about big data? What are the issues in storing and accessing data that has volume, velocity, and variety? Define Oozie, PIG, Zookeeper, Hive, MapReduce, and Spark. How are they used in big data initiatives? (Bloom level—Understand)

  2. 2.

    A good source of data to practice wrangling, analysis and visualization is DATA.gov, http://www.data.gov. Download a file and then one of the free trial software packages and try different things. Trifacta lets you work on data wrangling. Excel can help with analysis. Tableau can help with visualization. Other sources of data are

    1. (a)

      https://r-dir.com/reference/datasets,

    2. (b)

      https://www.kaggle.com/datasets and

    3. (c)

      http://www.pewresearch.org/data/download-datasets.(Bloom level—Apply)

  3. 3.

    Take a data set and graph the data five different ways, e.g. scatter plot, histogram, radar chart, or other types of graphs. What insight did you get looking at the graphs? What analytic questions do you have that you would like to pursue based on the graphs? Were the graphs consistent? Was there one that represented the data best and why? (Bloom level—Analyze)

  4. 4.

    Keep a log of data that you personally generate through online use, mobile devices, smart phones, email, music, videos, pictures, financial transactions, and fitness/health apps. What format is this data in? Conduct an exploratory data analysis. Visualize the results several ways. Evaluate the visualizations using Yau’s (2013) four components: visual cues, coordinate system, scale, and context. (Bloom level—Evaluate)

  5. 5.

    Create a list of keywords and a glossary for a document using Python. Download Python 3.4.4.msi (https://www.python.org/downmoads) and numpy-1.11.0.zip (http://www.numpy.org). Select a document and save it as a’.txt’ file (if the name of the file contains a /U, then replace that with //U so the name will parse; Python uses /U as a code). Develop a Python script to determine word frequency in the document (http://programminghistorian.org/lessons/counting-frequencies). Wrangle the data so that only words are left ad remove stop words. From the remaining list select keywords and glossary words. (Bloom level—Create)

1.8 3.1.8 Guidance for Learners and Faculty Using the Module

This case study has provided learning resources for faculty and students to learn about big data, data science, and data visualization. The best strategy is to select some of the resources that best match your learning style—visual, audio, and tactile—and interact with them first. You may also want to use various search engines to search for other information about big data, data science, and data visualization. All online resources were accessed in January or February 2016. Download some programs and data and explore the process of wrangling, analysis and visualization.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Warren, J.J. (2017). A Big Data Primer. In: Delaney, C., Weaver, C., Warren, J., Clancy, T., Simpson, R. (eds) Big Data-Enabled Nursing. Health Informatics. Springer, Cham. https://doi.org/10.1007/978-3-319-53300-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53300-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53299-8

  • Online ISBN: 978-3-319-53300-1

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics