Abstract
The aim of this chapter is to describe the history of big data and its characteristics—variety, velocity, and volume—and to serve as a big data primer. Many organizations are using big data to improve their operations and/or create new products and services. Methods for generating data, how data is sensed, and then stored, in other words data collection, will be described. Mobile and internet technologies have transformed data collection for these companies and new sources are emerging at an unheard of speed. Due to the explosion of data, the teams needed to manage the data have evolved to include data scientists, domain experts, computer scientists, visualization experts, and more. The ideas of intellectual property are also changing. Who owns the data, the products generated from the data, and applications of the data? Challenges and tools for data analytics and data visualization of big data will be described, thus, setting the foundation for the rest of the book.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
The Agile Movement. 2008, Oct 23. http://agilemethodology.org. Accessed 25 Jan 2016.
Anderson C. Creating a data-driven organization. Sebastopol, CA: O’Reilly Media; 2015. http://shop.oreilly.com/product/0636920035848.do
Apache Software Foundation. Welcome to Apache Hadoop. 2016. http://hadoop.apache.org. Accessed 15 Jan 2016.
Chan C. What Facebook deals with every day: 2.7 billion likes, 300 million photos uploaded and 5—terabytes of data. 2012, Aug 22. http://gizmodo.com/5937143/what-facebook-deals-with-everyday-27-billion-likes-300-million-photos-uploaded-and-500-terabytes-of-data. Accessed 18 Jan 2016.
Chartier T. Big data: how data analytics is transforming the world. Chantilly, VA: The Great Courses (includes video lectures); 2014. http://www.thegreatcourses.com/courses/big-data-how-data-analytics-is-transforming-the-world.html
Conway D. The data science Venn diagram. 2010, Sept 30. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram. Accessed 30 Jan 2016.
Data Science Association. Code of conduct. 2016. http://www.datascienceassn.org/code-of-conduct.html. Accessed 15 Apr 2016.
The Domesday Book Online. 2013. http://www.domesdaybook.co.uk. Accessed 10 Jan 2016.
Dumbill E. What is big data? In: O’Reilly Team, editor. Big data now. Sebastopol, CA: O’Reilly Media; 2012a. p. 3–10.
Dumbill E. Why big data is big: the digital nervous system. In: O’Reilly Team, editor. Big data now. Sebastopol, CA: O’Reilly Media; 2012b. p. 15–7.
Encyclopedia of Stone Age Art: Lascaux Cave Paintings. 2016. http://www.visual-arts-cork.com/prehistoric/lascaux-cave-paintings.htm. Accessed 10 Jan 2016.
Forrester Research. 2016. https://www.forrester.com/home. Accessed 6 Jan 2016.
Gartner. Gartner says 4.9 billion connected “things” will be in use in 2015. 2014, Nov 11. http://www.gartner.com/newsroom/id/2905717. Accessed 10 Jan 2016.
Gartner. Gartner magic quadrant. 2016a. http://www.gartner.com/technology/research/methodologies/research_mq.jsp. Accessed 5 Jan 2016.
Gartner. Gartner critical capabilities. 2016b. http://www.gartner.com/technology/research/methodologies/research_critcap.jsp. Accessed 5 Jan 2016.
Gartner. Gartner hype cycle. 2016c. http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp. Accessed 5 Jan 2016.
GilPress. Gartner’s hype cycle for big data. 2012, Oct. https://whatsthebigdata.com/2012/08/16/gartners-hype-cycle-for-big-data. Accessed 5 Jan 2016.
Google Flu Trends. 2014. https://www.google.org/flutrends/about. Accessed 15 Jan 2016.
Grus J. Data science from scratch: first principles with python. Sebastopol, CA: O’Reilly Media; 2015.
Gualtieri M, Curran R. The Forrester Wave: big data predictive analytics solutions, Q2, 2015. 2015, Apr 1. https://www.sas.com/content/dam/SAS/en_us/doc/analystreport/forrester-wave-predictive-analytics-106811.pdf. Accessed 18 Jan 2016.
Gualtieri M, Yuhanna N, Kisker H, Curran, R, Purcell B, Christakis S, Warrier S, Izzi M. The Forrester Wave™: big data Hadoop distributions, Q1 2016. 2016, Jan 19. https://www.forrester.com/report/The+Forrester+Wave+Big+Data+Hadoop+Distributions+Q1+2016/-/E-RES121574#AST1022630, Accessed 25 Jan 2016.
Guerra P, Borne K. Ten signs of data science maturity. Sebastpol, CA: O’Reilly Media; 2016.
Gunelius S. The data explosion in 2014 minute by minute—infographic. 2014, Jul 12. http://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic. Accessed 18 Jan 2016.
IBM. The four V’s of big data. 2015. http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg?cm_mc_uid=24189083104014574569048&cm_mc_sid_50200000=1457456904
IBM. What is big data? 2016. http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html. Accessed 18 Jan 2016.
Ims5. Nightingale’s coxcombs. 2008, May 11. http://understandinguncertainty.org/coxcombs. Accessed 5 Feb 2016.
Jones B. Communicating data with tableau: designing, developing, and delivering data visualization. Sebastopol, CA: O’Reilly Media; 2014. http://cdn.oreillystatic.com/oreilly/booksamplers/9781449372026_sampler.pdf
Knaflic CN. Storytelling with data: a data visualization guide for business professionals. Hoboken, NJ: Wiley; 2015.
Loukides M. What is data science? Sebastopol, CA: O’Reilly Media; 2012.
Manyika J, Ramaswamy S, Khanna S, Sazzazin, H, Pinkus, G, Sethupathy G, Yaffe A. Digital America: a tale of the haves and have-mores. 2015, Dec. http://www.mckinsey.com/industries/high-tech/our-insights/digital-america-a-tale-of-the-haves-and-have-mores. Accessed 15 Apr 2016.
Mark JJ. Cuneiform. In: Ancient history encyclopedia. 2011. http://www.ancient.eu/cuneiform. Accessed 10 Jan 2016.
Martin KE. Ethical issues in the big data industry. MIS Q Exec. 2015;14(2):67–85.
Mayer-Schonberger V, Cukier K. Big data: a revolution that will transform how we live, work, and think. New York: Hought Mifflin Harcourt Publishing; 2013.
Mearian L. By 2020, There will be 5,200 GB of data for every person on Earth. Computer World. 2012, Dec 11. http://www.computerworld.com/article/2493701/data-center/by-2020--there-will-be-5-200-gb-of-data-for-every-person-on-earth.html. Accessed 20 Feb 2016.
Miner D. Hadoop: what you need to know. Sebastopol, CA: O’Reilly Media; 2016.
Moore GE. Moore’s Law. 2016. http://www.mooreslaw.org. Accessed 18 Apr 2016.
National Archives: Domesday book. 2016. http://www.nationalarchives.gov.uk/museum/item.asp?item_id=1. Accessed 10 Jan 2016.
O’Neil C, Schutt R. Doing data science: straight talk from the frontline. Sebastopol, CA: O’Reilly Media; 2014.
Patil DJ, Mason H. Data driven: creating a data culture. Sebastopol, CA: O’Reilly Media; 2015. http://datasciencereport.com/2015/07/31/free-ebook-data-driven-creating-a-data-culture-by-chief-data-scientists-dj-patil-hilary-mason/#.Vp6x33n2bL8
Patil DJ. Building data science teams: the skills, tools, and perspectives behind great data science groups. Sebastopol, CA: O’Reilly Media; 2011.
Pentland AS. Big data’s biggest obstacles. Harvard Business Review Insight Center Report. The promise and challenge of big data supplement. 2012, Oct 2. p. 17–8.
Provost F, Fawcett T. Data science for business: what you need to know about data mining and data-analytic thinking. Sebastopol, CA: O’Reilly Media; 2013.
Rattenbury T, Hellerstein JM, Heer J, Kandel S. Data wrangling: techniques and concepts for agile analysts. Sebastopol, CA: O’Reilly Media; 2015.
ResearchGate. About us. 2016. https://www.researchgate.net/about. Accessed 30 Jan 2016.
Rosling H. Wealth and health of nations. 2008. http://www.gapminder.org/world. Accessed 25 Mar 2016.
Sandberg M. DataViz history: Charles Minard’s flow map of Napoleon’s Russian campaign of 1812. 2013, May 26. https://datavizblog.com/2013/05/26/dataviz-history-charles-minards-flow-map-of-napoleons-russian-campaign-of-1812-part-5. Accessed 25 Mar 2016.
Soubra D. The 3 Vs that define big data. 2012, Jul 5. http://www.datasciencecentral.com/forum/topics/the-3vs-that-define-big-data. Accessed 10 Jan 2016.
Stigler SM. The history of statistics: The measurement of uncertainty before 1900. Cambridge, MA: Belknap Press of Harvard University Press; 1990.
Tableau. The 5 most influential data visualizations of all time. 2016. http://www.tableau.com/top-5-most-influential-data-visualizations. Accessed 15 Jan 2016.
Trifacta. Six core data wrangling activities. 2015. https://www.trifacta.com/wp-content/uploads/2015/11/six-core-data-wrangling-activities-ebook.pdf. Accessed 10 Jan 2016.
Tufte ER. The visual display of quantitative information. Cheshire, CN: Graphcs Press; 1983.
Tufte ER. Envisioning information. Cheshire, CN: Graphics Press; 1990.
Tufte ER. Visual explanations: images and quantities, evidence and narrative. Cheshire, CN: Graphics Press; 1997.
Tukey JW. Exploratory data analysis. Boston: Addison-Wesley; 1977.
Twitter Usage Statistics. 2016. http://www.internetlivestats.com/twitter-statistics. Accessed 18 Jan 2016.
The Whitehouse. Draft consumer privacy bill of rights act. 2015. https://www.whitehouse.gov/sites/default/files/omb/legislative/letters/cpbr-act-of-2015-discussion-draft.pdf. Accessed 30 Jan 2016.
Yau N. Google’s chief economist Hal Varian on statistics and data. Jan 2009. https://flowingdata.com/2009/02/25/googles-chief-economist-hal-varian-on-statistics-and-data. Accessed 5 Jan 2016.
Yau N. Data points: visualization that means something. Indianapolis, IN: Wiley; 2013.
YouTube. Statistics. 2016. https://www.youtube.com/yt/press/statistics.html. Accessed 18 Jan 2016.
Yuhanna N. The Forrester wave: in-memory database platforms, Q3. 2015, Aug 3. http://go.sap.com/docs/download/2015/08/4481ad9e-3a7c-0010-82c7-eda71af511fa.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Case Study 3.1: Big Data Resources—A Learning Module
Case Study 3.1: Big Data Resources—A Learning Module
Abstract
This case study is a compilation of resources for a learner to explore to gain beginning knowledge and skill in big data, data science, and data visualization. The resources focus on acquiring knowledge through books, white papers, videos, conferences, and online learning opportunities. There are also resources for learning about the hardware and software needed to engage in big data.
Keywords
Big dataData scienceData visualizationData wranglingHadoop/mapreduceData analyticsData scientistData science teamsVolume/variety/velocity of data setsData products
1.1 3.1.1 Introduction
The volume, variety, and velocity of big data exceed the volume of datasets common in health care research and operations. New technologies created to manage and analyze big data are being developed and tested at a rapid rate. This life cycle process is happening so fast that it is difficult to learn the technology and approaches much less keep up on the latest innovations. The phases of this life cycle are development, testing, discarding, testing, adopting, combining, using and discarding/reworking. These phases transpire in swift iterative cycles, and data scientists who utilize the tools work with a toolbox composed of well-developed software to niche software designed for specific uses, many of which are open source.
Today we are overwhelmed with an unprecedented amount of information and data. Big data comes from all kinds of sources: global positioning devices (GPS), loyalty shopping cards, online searches and selections, genomic information, traffic and weather information, health data from all sorts of personal devices (person generated health data), as well as data created from healthcare during inpatient and outpatient visits. Data is collected every second of every day. These types of data, including unstructured raw data, have been used in other industries to understand their business and create new products. Healthcare has been slower to adopt the use of big data in this way. The 2013 report by McKinsey Global Institute proposes that the effective use of big data in healthcare could create large value for the healthcare industry, over $300 billion every year (Kayyali B, Knott D, Van Kuiken, S. The big-data revolution in US health care: Accelerating value and innovation. Mc Kinsey & Company. 2013. Accessed at http://www.mckinsey.com/industries/healthcare-systems-and-services/our-insights/the-big-data-revolution-in-us-health-care).
The effective use of big data requires a data science approach to find and analyze subsets of data that administrators, clinicians, and researchers will find usable. Unlike a relational database where writing a query is fairly straight forward, gathering data from multiple data stores/warehouses of big data is much more complex. The ability to manage an incoming data stream of extraordinary volume, velocity and variety of data requires the expertise of a team. This case study provides a beginning resource for learning about big data, data science, and data visualization.
1.2 3.1.2 Resources for Big Data
As big data has caught the imagination of corporations and health care, the resources have exploded and most are readily available on the Internet. The following resources have been selected for learners who are just beginning their exploration of big data and a few that will stretch their knowledge towards competence. As you do your own searches, you will find many more. This listing will get you into the field and Internet space to find more resources that fit your learning style.
1.2.1 3.1.2.1 Big Data Conferences
Conferences are good places to explore a new field or gain more understanding of a field with which you have expertise. Networking is key at these events and can link you to others for future project work. These are just the tip of the iceberg of conferences, so enjoy looking for new ones near you.
-
1.
In 2013, the University of Minnesota School of Nursing convened the first conference called Nursing Knowledge: Big Data Science, http://www.nursing.umn.edu/icnp/center-projects/big-data/index.htm. The first conference was invitational and explored the potential of big data for the improvement of patient outcomes as the result of nursing care. The conference was so successful, it has been held annually and been open to all registrants. Nursing Knowledge: Big Data Science is a working conference with many workgroups creating projects that are making an impact in Nursing research, education, and practice.
-
2.
“Big Data 2 Knowledge” hosted by the National Institutes of Health (NIH) also has conferences, training sessions, and webinars. These events are geared towards creating a research cohort that is expert in big data and data analytics.
-
3.
The Strata + Hadoop World Big Data conference is a meeting where business decision makers, strategists, architects, developers, and analysts gather to discuss big data and data science. At the conference you explore big data and hear what is emerging in the industry (http://conferences.oreilly.com/strata/hadoop-big-data-ca). O’Reilly Media and others put on this conference and afterwards post all the presentations to their web site. So even if you can’t attend, you can hear about cutting-edge big data.
1.2.2 3.1.2.2 Big Data Books and Articles
A tried and traditional way to learn about any knowledge is through books, journal articles, and white papers. The following are basic references to get you started in the big data initiative.
-
1.
Anderson C. Creating a Data-Driven Organization. Sebastopol, CA: O’Reilly Media; 2015. http://shop.oreilly.com/product/0636920035848.do
-
2.
Betts R, Hugg, J. Fast Data: Smart and at Scale. Sebastopol, CA: O’Reilly Media; 2015. https://voltdb.com/blog/introducing-fast-data-smart-and-scale-voltdbs-new-recipes-ebook
-
3.
Brennan PF, Bakken S. Nursing needs big data and big data needs nursing. Journal of Nursing Scholarship, 2015;47: 477–484.
-
4.
Chartier, T. Big Data: How Data Analytics Is Transforming the World. Chantilly, VA: The Great Courses. 2014. (includes video lectures). (http://www.thegreatcourses.com/courses/big-data-how-data-analytics-is-transforming-the-world.html)
-
5.
Davenport T, Dyche J. Big data in big companies. 2013. http://www.sas.com/reg/gen/corp/2266746. Accessed 15 Dec 2015.
-
6.
Mayer-Schonberger V, Cukier K. Big Data: A revolution that will transform how we live, work, and think. New York: Hought Mifflin Harcourt Publishing. 2013.
-
7.
O’Reilly Radar Team. Planning for big data: A CIO’s handbook to the changing data landscape. Sebastopol, CA: O’Reilly Media. 2012. http://www.oreilly.com/data/free/planning-for-big-data.csp
-
8.
O’Reilly Team. Big data now. Sebastopol, CA: O’Reilly Media. 2012.
-
9.
Patil DJ, Mason H. Data driven: Creating a data culture. Sebastopol, CA: O’Reilly Media. 2015. http://datasciencereport.com/2015/07/31/free-ebook-data-driven-creating-a-data-culture-by-chief-data-scientists-dj-patil-hilary-mason/#.Vp6x33n2bL8
1.2.3 3.1.2.3 Big Data Videos
For those who need to see and hear, videos are great. Below are some from YouTube and other websites. Don’t forget to look at Tim Chartier’s work listed in the Books section. Great Courses combine expert faculty, a book, and video lectures. Explore TED Talks for more information about Big Data.
-
1.
Big Data Tutorials and TED Talks, http://www.analyticsvidhya.com/blog/2015/07/big-data-analytics-youtube-ted-resources/
-
2.
Kenneth Cukier: Big data is better data, https://www.youtube.com/watch?v=8pHzROP1D-w
-
3.
The Secret Life of Big Data | Intel, https://www.youtube.com/watch?v=CNoi-XqwJnA (a good overview of the history of Big Data, a must watch)
-
4.
What is Big Data? https://www.youtube.com/watch?v=c4BwefH5Ve8
-
5.
What is BIG DATA? BIG DATA Tutorial for Beginners, https://www.youtube.com/watch?v=2NLyIqU-xwg
-
6.
What Is Apache Hadoop? http://hadoop.apache.org/
-
7.
What is Big Data and Hadoop? https://www.youtube.com/watch?v=FHVuRxJpiwI
-
8.
What Does The Internet of Things Mean? https://www.youtube.com/watch?v=Q3ur8wzzhBU
-
9.
MapReduce, https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
1.2.4 3.1.2.4 Big Data Web Sites
Many companies, with a web site, provide information, free books, white papers, tutorials, and free trial software. These sites are a rich resource. Gartner and Forrester are companies that evaluate and rate emerging companies and products in the big data industry.
-
1.
IBM Big Data and Analytics platform, now known as IBM Watson Foundations, http://www.ibmbigdataanalytics.com
-
2.
Forrester Wave, https://www.forrester.com/The+Forrester+Wave+Big+Data+Hadoop+Distributions+Q1+2016/fulltext/-/E-res121574#AST1022630
-
3.
Gartner, http://www.gartner.com/technology/research/methodologies/research_mq.jsp
-
(a)
Magic Quadrant
-
(b)
HypeCycle
-
(c)
Critical Capabilities
-
(a)
-
4.
Intel Processors, http://www.intel.com/content/www/us/en/homepage.html
-
(a)
The Butterfly Dress, https://www.youtube.com/watch?v=6ELuq3CzJys (a bit of fun with data and technology)
-
(b)
50th Anniversary of Moore’s Law, http://newsroom.intel.com/docs/DOC-6429 (if you are in informatics, you must know about Moore’s Law)
-
(c)
How Intel Gave Stephen Hawking his Voice, http://www.wired.com/2015/01/intel-gave-stephen-hawking-voice; https://www.youtube.com/watch?v=JA0AZUj2lOs
-
(a)
-
5.
Kaggle, www.kaggle.com
-
6.
O’Reilly Media, https://www.oreilly.com/topics/data
- 7.
-
8.
VoltDB, https://voltdb.com
-
9.
Yuhanna, N. (August 3, 2015). The Forrester Wave: In-Memory Database Platforms, Q3 2015. http://go.sap.com/docs/download/2015/08/4481ad9e-3a7c-0010-82c7-eda71af511fa.pdf
- 10.
1.3 3.1.3 Resources for Data Science
Data science is composed of data wrangling and data analysis. Data wrangling is the process of cleaning and mapping data from one “raw” form into another format. Then algorithms can be applied to make sense of big data. The following resources have been selected for learners who are just beginning their exploration of data science and a few that will stretch their knowledge towards competence. As you do your own searches, you will find many more. This listing will get you into the field and Internet space to find more resources that fit your learning style.
1.3.1 3.1.3.1 Data Science Conferences
Conferences are good places to explore a new field or gain more understanding of a field with which you have expertise. Networking is key at these events and can link you to others for future project work.
-
1.
“Big Data 2 Knowledge” hosted by the National Institutes of Health (NIH) also has conference, training sessions, and webinars. These events are geared towards creating a research cohort that is expert in Big Data and Data Analytics.
-
2.
The Data Science Conference, http://www.thedatascienceconference.com.
1.3.2 3.1.3.2 Data Science Books and Articles
A tried and traditional way to learn about any knowledge is through books, journal articles, and white papers. The following are basic references to get you started in Data Science.
-
1.
Ghavami, PK. Clinical Intelligence: The big data analytics revolution in healthcare: A framework for clinical and business intelligence. CreateSpace Independent Publishing Platform. 2014.
-
2.
Grus, J. Data science from scratch: First principles with python. Sebastopol, CA: O’Reilly Media. 2015.
-
3.
Gualtieri M, Curran R. The Forrester Wave: Big data predictive analytics solutions, Q2, 2015. April 1, 2015. https://www.sas.com/content/dam/SAS/en_us/doc/analystreport/forrester-wave-predictive-analytics-106811.pdf
-
4.
Janert, PK. Data analysis with open source tools: A hands-on guide for programmers and data scientists. Sebastopol, CA: O’Reilly Media. 2010.
-
5.
Loukides, M. What is data science? Sebastopol, CA: O’Reilly Media. 2012.
-
6.
Marconi K, Lehmann H. Big data and health analytics. Boca Raton, FL: CRC Press. 2015.
-
7.
O’Neil C, Schutt R. Doing data science: Straight talk from the frontline. Sebastopol, CA: O’Reilly Media. 2015.
-
8.
Optum. Getting from big data to good data: Creating a foundation for actionable analytics. 2015. https://www.optum.com/content/dam/optum/CMOSpark%20Hub%20Resources/White%20Papers/OPT_WhitePaper_ClinicalAnalytics_ONLINE_031414.pdf
-
9.
Patil DJ. Building data science teams: The skills, tools, and perspectives behind great data science groups. Sebastopol, CA: O’Reilly Media. 2011.
-
10.
Provost F, Fawcett T. Data science for business: What you need to know about data mining and data-analytic thinking. Sebastopol, CA: O’Reilly Media. 2013.
-
11.
Rattenbury T, Hellerstein JM, Heer J, Kandel S. Data wrangling: Techniques and concepts for agile analysts. Sebastopol, CA: O’Reilly Media. 2015.
-
12.
Tailor K. The patient revolution: How big data and analytics are transforming the health care experience. Hoboken, NJ: John Wiley and Sons. 2016.
-
13.
Trifacta. Six Core Data Wrangling Activities. 2015. https://www.trifacta.com/wp-content/uploads/2015/11/six-core-data-wrangling-activities-ebook.pdf. Accessed 15 Jan 2016.
1.3.3 3.1.3.3 Data Science Videos
For those who need to see and hear, videos are great. Below are some from YouTube and other websites. Explore TED Talks for more information about Big Data.
-
1.
Analytics 2013—Keynote—Jim Goodnight, SAS, https://www.youtube.com/watch?v=AEI0fBQYJ1c
-
2.
Big Data Analytics: The Revolution Has Just Begun, https://www.youtube.com/watch?v=ceeiUAmbfZk
-
3.
Building Data Science Teams, https://www.youtube.com/watch?v=98NrsLE6ot4
-
4.
Deep Learning: Intelligence from Big Data, https://www.youtube.com/watch?v=czLI3oLDe8M
-
5.
The Future of Data Science—Data Science @ Stanford, https://www.youtube.com/watch?v=hxXIJnjC_HI
-
6.
The Patient Revolution: How Big Data and Analytics Are Transforming the Health Care Experience, https://www.youtube.com/watch?v=oDztVSDUbxo
1.3.4 3.1.3.4 Data Science Web Sites
Many companies, with a web site, provide information, free books, white papers, tutorials, and free trial software. These sites are a rich resource.
-
1.
Alteryx, http://www.alteryx.com.
-
2.
Data Science at NIH, https://datascience.nih.gov/bd2k
- 3.
-
4.
Kaggle-the Home of Data Science, https://www.kaggle.com
-
5.
Python Programming Language, https://www.python.org/
-
6.
R Programming language, https://www.r-project.org/about.html
- 7.
-
8.
Trifacta, https://www.trifacta.com/support.
1.4 3.1.4 Resources for Data Visualization
Data visualization is the third part of big data. Humans can absorb more data when it is depicted in images or graphs. The following resources have been selected for learners who are just beginning their exploration of data visualization and a few that will stretch their knowledge towards competence. As you do your own searches, you will find many more. This listing will get you into the field and Internet space to find more resources that fit your learning style.
1.4.1 3.1.4.1 Data Visualization Conferences
Conferences are good places to explore a new field or gain more understanding of a field with which you have expertise. Networking is key at these events and can link you to others for future project work. Most conferences on big data and data science include presentations on data visualization.
1.4.2 3.1.4.2 Data Visualization Books and Articles
A tried and traditional way to learn about any knowledge is through books, journal articles, and white papers. The following are basic references to get you started in the data visualization.
-
1.
Beegel J. Infographics for dummies. Hoboken, NJ: John Wiley & Sons. 2014.
-
2.
Few S. Now you see it: Simple visualization techniques for quantitative analysis. Oakland, CA: Analytics Press. 2009.
-
3.
Harris RL. Information graphics: A comprehensive reference. Atlanta, GA: Management Graphics. 1996.
-
4.
Jones B. Communicating data with tableau: Designing, developing, and delivering data visualization. Sebastopol, CA: O’Reilly Media. 2014. http://cdn.oreillystatic.com/oreilly/booksamplers/9781449372026_sampler.pdf
-
5.
Knaflic CN. Storytelling with data: A data visualization guide for business professionals. Hoboken, NJ: John Wiley & Sons. 2015.
-
6.
Tufte ER. Envisioning information. Cheshire, CN: Graphics Press. 1990.
-
7.
Tufte ER. The visual display of quantitative information. Cheshire, CN: Graphics Press. 1983. (This is the classic text in visualization.)
-
8.
Tufte ER. Visual explanations: Images and quantities, evidence and narrative. Cheshire, CN: Graphics Press. 1997.
-
9.
Yau N. Data points: Visualization that means something. Indianapolis, IN: John Wiley & Sons; 2013.
-
10.
Yau N. Visualize this: The FlowingData guide to design, visualization, and statistics. Indianapolis, IN: John Wiley & Sons; 2011.
1.4.3 3.1.4.3 Data Visualization Videos
For those who need to see and hear, videos are great. Below are some from YouTube and other websites. Explore TED Talks for more information about Big Data.
-
1.
The beauty of data visualization, https://www.youtube.com/watch?v=5Zg-C8AAIGg
-
2.
The best stats you’ve ever seen, https://www.youtube.com/watch?v=usdJgEwMinM
-
3.
Designing Data Visualizations, https://www.youtube.com/watch?v=lTAeMU2XI4U
-
4.
The Future of Data Visualization, https://www.youtube.com/watch?v=vc1bq0qIKoA
-
5.
Introduction to Data Visualization, https://www.youtube.com/watch?v=XIgjTuDGXYY
1.4.4 3.1.4.4 Data Visualization Web Sites
Many companies, with a web site, provide information, free books, white papers, tutorials, and free software.
-
1.
FlowingData, https://flowingdata.com.
-
2.
SAS, http://www.sas.com/en_us/home.html
-
(a)
Data visualization and why it is important, http://www.sas.com/en_us/insights/big-data/data-visualization.html
-
(a)
-
3.
Tableau, http://www.tableau.com/
-
(a)
Tableau. (2015). The 5 Most Influential Data Visualizations of All Time. http://www.tableau.com/top-5-most-influential-data-visualizations (note Florence Nightingale is the number two graph)
-
(b)
Visual Analysis Best Practices: Simple Techniques for Making Every Data Visualization Useful and Beautiful, http://get.tableau.com/asset/10-tips-to-create-useful-beautiful-visualizations.html
-
(a)
-
4.
Trifacta, https://www.trifacta.com
1.5 3.1.5 Organizations of Interest
As the field of big data, data science and data visualization evolve, professional organizations will be formed. Listservs and blogs will be created. Academia will offer courses and degree programs. Certification and accreditation organizations will help to establish quality programs and individual performance. The following are just a sampling of what exists.
1.5.1 3.1.5.1 Professional Associations
Professionals will form professional organizations as they define their discipline. The organizations provide a forum for discussing practice, competencies, education, and the future.
-
1.
American Statistics Association, http://www.amstat.org/
-
2.
American Association of Big Data Professionals, https://aabdp.org/
-
(a)
Offers certification in various Big Data roles, https://aabdp.org/certifications.html
-
(a)
-
3.
Data Science Association, http://www.datascienceassn.org/
-
4.
Digital Analytics Association, http://www.digitalanalyticsassociation.org/
1.5.2 3.1.5.2 Listservs: A Sampling
Most web sites, organizations, industry, and publishers have listservs. This is a very efficient way to keep up with what is happening in these areas. The listserv is pushed to your email and enables you to see the latest thoughts, conferences, books, and software an industry that is evolving rapidly.
-
1.
10 Data Science Newsletters To Subscribe To, https://datascience.berkeley.edu/10-data-science-newsletters-subscribe
-
2.
Information Management, http://www.information-management.com/news/big-data-analytics/Big-Data-Scientist-Careers-10026908-1.html
-
3.
O’Reilly Data Newsletter, http://www.oreilly.com/data/newsletter.html. Sign up to get the latest information about Big Data, Data Analytics, Data Visualization, and Conferences.
1.5.3 3.1.5.3 Certificates and Training: A Sampling
As jobs in these fields become more widely available, the demand for these skills will grow. Online education and formal degrees will become important for employers to consider. Certification may make a difference for employment.
-
1.
Data Science at Coursera, https://www.coursera.org/specializations/jhu-data-science
-
2.
Data at Coursera, https://www.coursera.org/specializations/big-dataQ
-
3.
SAS Certification program, http://support.sas.com/certify/index.html
-
4.
MIT Professional Education, https://mitprofessionalx.mit.edu/about
-
5.
R Programming, https://www.coursera.org/learn/r-programming
1.5.4 3.1.5.4 Degree Programs: A Sampling
Degree programs are proliferating as the demand for big data professionals and data scientists increases. It will be important to select well before investing time and money into the programs. Always look for programs that are accredited. The University/College must be accredited by the US Department of Education. Even the department/school they reside in must be accredited by the appropriate accreditor. Accreditation assures the quality of the education.
-
1.
23 Great Schools with Master’s Programs in Data Science, http://www.mastersindatascience.org/schools/23-great-schools-with-masters-programs-in-data-science
-
2.
Carnegie Mellon University, http://www.cmu.edu/graduate/data-science/
-
3.
Harvard, http://online-learning.harvard.edu/course/big-data-analytics
-
4.
List of Graduate Programs in Big Data & Data Science, http://www.amstat.org/education/bigdata.cfm
-
5.
Map of University Programs in Big Data Analytics, http://data-informed.com/bigdata_university_map/
-
6.
Northwestern Kellogg School of Management, http://www.kellogg.northwestern.edu/execed/programs/bigdata.aspx?gclid=CLTa_Jf5u8oCFYVFaQodCpwHag
1.6 3.1.6 Assessment of Competencies
Teachers and students have used Bloom’s Taxonomy to create objectives that specify what is to be learned. The levels of Bloom can also be used to guide evaluation of the attainment of these objectives by the student. In 2002, Bloom’s was revised to reflect cognitive processes as well as knowledge attainment (http://www.unco.edu/cetl/sir/stating_outcome/documents/Krathwohl.pdf). The new taxonomic hierarchy is as follows (Krathwohl, 2002, p215):
-
1.
“Remember—retrieving relevant knowledge from long-term memory
-
2.
Understand—determining the meaning of information
-
3.
Apply—using a procedure in a given situation
-
4.
Analyze—breaking material into its constituent parts and detecting the relationships between the parts and the whole
-
5.
Evaluate—making judgements based on criteria
-
6.
Create—putting elements together to form a coherent whole or make a product.”
For big data and data science assignments the graduate student should be able to master the levels of “remember, understand, and apply” by engaging with the above resources. Objective assessments, in the form of tests, can then be used to determine mastery. Performance assessments are used to evaluate the achievement of the higher levels of Bloom-- analyze, evaluate and create. Performance assessments are conducted by experts and faculty through the use of case studies, simulations, projects, presentations, or portfolios.
1.7 3.1.7 Learning Activities
The following are several learning activities designed to help you apply the knowledge and skills learned from the above resources. The Bloom level for each activity is listed.
-
1.
Conduct a web search on HADOOP and data warehouses. What did you learn about big data? What are the issues in storing and accessing data that has volume, velocity, and variety? Define Oozie, PIG, Zookeeper, Hive, MapReduce, and Spark. How are they used in big data initiatives? (Bloom level—Understand)
-
2.
A good source of data to practice wrangling, analysis and visualization is DATA.gov, http://www.data.gov. Download a file and then one of the free trial software packages and try different things. Trifacta lets you work on data wrangling. Excel can help with analysis. Tableau can help with visualization. Other sources of data are
- (a)
- (b)
-
(c)
http://www.pewresearch.org/data/download-datasets.(Bloom level—Apply)
-
3.
Take a data set and graph the data five different ways, e.g. scatter plot, histogram, radar chart, or other types of graphs. What insight did you get looking at the graphs? What analytic questions do you have that you would like to pursue based on the graphs? Were the graphs consistent? Was there one that represented the data best and why? (Bloom level—Analyze)
-
4.
Keep a log of data that you personally generate through online use, mobile devices, smart phones, email, music, videos, pictures, financial transactions, and fitness/health apps. What format is this data in? Conduct an exploratory data analysis. Visualize the results several ways. Evaluate the visualizations using Yau’s (2013) four components: visual cues, coordinate system, scale, and context. (Bloom level—Evaluate)
-
5.
Create a list of keywords and a glossary for a document using Python. Download Python 3.4.4.msi (https://www.python.org/downmoads) and numpy-1.11.0.zip (http://www.numpy.org). Select a document and save it as a’.txt’ file (if the name of the file contains a /U, then replace that with //U so the name will parse; Python uses /U as a code). Develop a Python script to determine word frequency in the document (http://programminghistorian.org/lessons/counting-frequencies). Wrangle the data so that only words are left ad remove stop words. From the remaining list select keywords and glossary words. (Bloom level—Create)
1.8 3.1.8 Guidance for Learners and Faculty Using the Module
This case study has provided learning resources for faculty and students to learn about big data, data science, and data visualization. The best strategy is to select some of the resources that best match your learning style—visual, audio, and tactile—and interact with them first. You may also want to use various search engines to search for other information about big data, data science, and data visualization. All online resources were accessed in January or February 2016. Download some programs and data and explore the process of wrangling, analysis and visualization.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Warren, J.J. (2017). A Big Data Primer. In: Delaney, C., Weaver, C., Warren, J., Clancy, T., Simpson, R. (eds) Big Data-Enabled Nursing. Health Informatics. Springer, Cham. https://doi.org/10.1007/978-3-319-53300-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-53300-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53299-8
Online ISBN: 978-3-319-53300-1
eBook Packages: MedicineMedicine (R0)