Plastic Pollution

The data this week comes from Break Free from Plastic courtesy of Sarah Sauve.

Sarah put together a nice Blogpost on her approach to this data, which includes cleaning the data and a Shiny app!

Per Sarah:

I found out about Break Free From Plastic’s Brand Audits through my involvement with the local Social Justice Cooperative of Newfoundland and Labrador’s Zero Waste Action Team.

One of my colleagues and friends proposed an audit in St. John’s, partially to contribute to the global audit and as part of a bigger project to understand the sources of plastic in our city. We completed our audit in October 2020 and are the first submission to BFFP from Newfoundland! You can find our data presented in this Shiny dashboard.

It’s an interesting dataset, with lots of room to play around and so many options for visualization, plus plastic pollution is an important topic to talk about and raise awareness of! You can read BFFP’s Brand Audit Reports for 2018, 2019 and 2020 to get an idea of what they’ve done with the data.

I downloaded the raw data from her Google Drive, and have a short cleaning script at the bottom of this readme. Note that the data has already been combined, but feel free to play around with the raw data itself.

The data is available through Google Drive; you can find the 2019 data here and the 2020 data here.

Get the data here

# Get the Data

# Read in with tidytuesdayR package 
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest

# Either ISO-8601 date or year/week works!

tuesdata <- tidytuesdayR::tt_load('2021-01-26')
tuesdata <- tidytuesdayR::tt_load(2021, week = 5)

plastics <- tuesdata$plastics

# Or read in the data manually

plastics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-01-26/plastics.csv')

Data Dictionary

Note that the plastic types are not in tidy format, and you’ll likely want to pivot_longer().

The plastic is categorized by recycling codes.

`plastics.csv`

variable	class	description
country	character	Country of cleanup
year	double	Year (2019 or 2020)
parent_company	character	Source of plastic
empty	double	Category left empty count
hdpe	double	High density polyethylene count (Plastic milk containers, plastic bags, bottle caps, trash cans, oil cans, plastic lumber, toolboxes, supplement containers)
ldpe	double	Low density polyethylene count (Plastic bags, Ziploc bags, buckets, squeeze bottles, plastic tubes, chopping boards)
o	double	Category marked other count
pet	double	Polyester plastic count (Polyester fibers, soft drink bottles, food containers (also see plastic bottles)
pp	double	Polypropylene count (Flower pots, bumpers, car interior trim, industrial fibers, carry-out beverage cups, microwavable food containers, DVD keep cases)
ps	double	Polystyrene count (Toys, video cassettes, ashtrays, trunks, beverage/food coolers, beer cups, wine and champagne cups, carry-out food containers, Styrofoam)
pvc	double	PVC plastic count (Window frames, bottles for chemicals, flooring, plumbing pipes)
grand_total	double	Grand total count (all types of plastic)
num_events	double	Number of counting events
volunteers	double	Number of volunteers

Cleaning Script

NOTE: This is not necessary to use this data, but is just an example of how I prepared the plastics.csv dataset, which is already available.

library(tidyverse)
library(fs)

files_2020 <- fs::dir_ls("2020 BFFP National Data Results") %>% 
  str_subset("csv")

files_2019 <- fs::dir_ls("2019 Brand Audit Appendix _ Results by Country/Countries") %>% 
  str_subset("csv")

data_2020 <- files_2020 %>% 
  map_dfr(read_csv, col_types = cols(
    Country = col_character(),
    Parent_company = col_character(),
    Empty = col_double(),
    HDPE = col_double(),
    LDPE = col_double(),
    O = col_double(),
    PET = col_double(),
    PP = col_double(),
    PS = col_double(),
    PVC = col_double(),
    Grand_Total = col_character(),
    num_events = col_double(),
    volunteers = col_double()
  )) %>% 
  mutate(year = 2020, .after = Country) %>% 
  mutate(Grand_Total = parse_number(Grand_Total)) %>% 
  janitor::clean_names()

data_2019 <- files_2019 %>% 
  set_names(str_replace(., ".*[/]([^.]+)[.].*", "\\1")) %>% 
  map_dfr(read_csv, .id = "country", col_types = cols(
    Country = col_character(),
    Parent_company = col_character(),
    Empty = col_double(),
    HDPE = col_double(),
    LDPE = col_double(),
    O = col_double(),
    PET = col_double(),
    PP = col_double(),
    PS = col_double(),
    PVC = col_double(),
    Grand_Total = col_double(),
    num_events = col_double(),
    volunteers = col_double()
  )) %>% 
  select(country, everything()) %>% 
  mutate(year = 2019, .after = country) %>% 
  janitor::clean_names()  %>% 
  mutate(pp = if_else(is.na(pp_2), pp, pp_2 + pp),
         ps = if_else(is.na(ps_2), ps, ps + ps_2)) %>% 
  rename(parent_company = parent_co_final, num_events = number_of_events, volunteers= number_of_volunteers) %>% 
  select(-ps_2, -pp_2)

combo_data <- bind_rows(data_2019, data_2020) 

combo_data %>% 
  write_csv("2021/2021-01-26/plastics.csv")