Skip to main content

Putting It All Together – R, the tidyverse Ecosystem, and APIs

  • Chapter
  • First Online:
Introduction to Data Science in Biostatistics
  • 41 Accesses

Abstract

The purpose of this lesson is to provide summary information on how R and more specifically R’s tidyverse ecosystem are both used in support of data science. A few key concepts about the tidyverse ecosystem are reinforced, such as: (1) use of an Application Programming Interface (API) in an effort to obtain data; (2) the need to put data into tidy format; (3) and use of the tidyverse ecosystem in support of statistical analyses and the creation of figures, maps, and other visuals. An introduction is also offered on how data scientists prepare reports and the way supporting software and processes for the same can be integrated into R. A few comments are made on the next steps for those who wish to continue in data science. Perhaps most importantly for those who are in the early days of career exploration and advancement, there is also a discussion on the soft skills needed by those who wish to become leaders in data science and in turn use data science to promote societal improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For those who are not well-acquainted with dairy cattle and characteristics of the leading breeds, review State and national standardized lactation averages by breed for cows calving in 2007 (https://queries.uscdcb.com/publish/dhi/dhi09/laall.shtml) for generalized by breed statistics not only on milk production (lb), but statistics that also address fat and protein production (% and lb). There are many dairy herdsmen who place a high value on the production of fat (% and lb.) and protein (% and lb.) and are willing to accept less milk production in terms of measured weight (lb.).

  2. 2.

    Whether data are nonparametric or parametric is often a matter of personal judgment or group consensus. Ideally, the final decision is based not only on observation of the data but is also a result of applied tests such as the Anderson-Darling test or the Shapiro test, but more discussion on this issue would go beyond the scope for this specific lesson.

  3. 3.

    For those with special interest in the issue of normal distribution and the selection of a nonparametric or parametric approach to inferential test selection, look at use of the dlookr::normality() function, such as dlookr::normality(Pounds) alone or chained to testing of normal distribution by groups by using the dplyr::group_by() function.

  4. 4.

    Consider a map of the United Kingdom of Great Britain and Northern Ireland (UK). Should this map include only England, Northern Ireland, Scotland, and Wales? How would the Republic of Ireland show on this map, given how it is part of the same land mass as the land mass for Northern Ireland? Then, add to this complexity the Channel Islands such as the Bailiwick of Guernsey and the Bailiwick of Jersey. Should these two entities be included in a map of the UK? Should the Isle of Man also show on the map? Should the British Virgin Islands, the Falkland Islands, Gibraltar, and other British Overseas Territories show on the map? What about the Chagos Archipelago? Should Rockall be included? The complexity of maps goes far beyond the use of R or any other software for their creation.

  5. 5.

    Is it possible to distinguish the borders for Luxembourg in this map?

  6. 6.

    Notice how data may not be available for all geographic entities, or there may be concerns about the efficacy of some data.

  7. 7.

    Review Federal Information Processing System (FIPS) Codes for States and Counties, https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt, for state by state and county by county FIPS codes.

  8. 8.

    Review materials such as ZIP Code Tabulation Areas (ZCTAs) (https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html) to learn about the way United States Postal Service ZIP Codes are accommodated when working with output gained from the Census Bureau. Census Bureau ZCTAs seem to be similar to Postal Service ZIP Codes, but not quite.

  9. 9.

    Look at the accommodation that was needed for the county named Cape May. What is the issue? From many possible ways to approach this issue, what tidyverse tool is best for this accommodation?

  10. 10.

    Many documents are also prepared by use of word processing software, but it is not necessary to comment too much of its use other than to mention that some of the most popular word processing software packages are proprietary and it cannot be assumed that interested peers and students have access to the same. In contrast, the typesetting approach demonstrated in this section (both R Markdown and LaTeX) is based on markup software that is legally and freely obtained.

  11. 11.

    Although there is no desire to make negative comments on the use of word processing software, investigate distinction between the expression WYSIWYG (what you see is what you get) v WYSIAYG (what you see is all you get) when deciding to prepare a document with word processing software v the decision to prepare a document using a markup language and accompanying software. Decide if the need for inclusion of comments, syntax, and other text directly in the manuscript, but text that is not visible in the final report, has value when selecting document preparation software.

Author information

Authors and Affiliations

Authors

Electronic Supplementary Materials

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

MacFarland, T.W. (2024). Putting It All Together – R, the tidyverse Ecosystem, and APIs. In: Introduction to Data Science in Biostatistics. Springer, Cham. https://doi.org/10.1007/978-3-031-46383-9_7

Download citation

Publish with us

Policies and ethics