TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Superbowl commercials
      • Get the data here
      • Data Dictionary
    • youtube.csv
      • Cleaning Script

    Superbowl commercials

    The data this week comes from FiveThirtyEight. They have a corresponding article on the topic. Note that the original source was superbowl-ads.com. You can watch all the ads via the FiveThirtyEight article above.

    Like millions of viewers who tune into the big game year after year, we at FiveThirtyEight LOVE Super Bowl commercials. We love them so much, in fact, that we wanted to know everything about them … by analyzing and categorizing them, of course. We dug into the defining characteristics of a Super Bowl ad, then grouped commercials based on which criteria they shared — and let me tell you, we found some really weird clusters of commercials.

    We watched 233 ads from the 10 brands that aired the most spots in all 21 Super Bowls this century, according to superbowl-ads.com.1 While we watched, we evaluated ads using seven specific criteria, marking every spot as a “yes” or “no” for each:

    Get the data here

    # Get the Data
    
    # Read in with tidytuesdayR package 
    # Install from CRAN via: install.packages("tidytuesdayR")
    # This loads the readme and all the datasets for the week of interest
    
    # Either ISO-8601 date or year/week works!
    
    tuesdata <- tidytuesdayR::tt_load('2021-03-02')
    tuesdata <- tidytuesdayR::tt_load(2021, week = 10)
    
    youtube <- tuesdata$youtube
    
    # Or read in the data manually
    
    youtube <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-03-02/youtube.csv')

    Data Dictionary

    youtube.csv

    variable class description
    year double Superbowl year
    brand character Brand for commercial
    superbowl_ads_dot_com_url character Superbowl ad URL
    youtube_url character Youtube URL
    funny logical Contains humor
    show_product_quickly logical Shows product quickly
    patriotic logical Patriotic
    celebrity logical Contains celebrity
    danger logical Contains danger
    animals logical Contains animals
    use_sex logical Uses sexuality
    id character Youtube ID
    kind character Youtube Kind
    etag character Youtube etag
    view_count integer Youtube view count
    like_count integer Youtube like count
    dislike_count integer Youtube dislike count
    favorite_count integer Youtube favorite count
    comment_count integer Youtube comment count
    published_at character Youtube when published
    title character Youtube title
    description character Youtube description
    thumbnail character Youtube thumbnail
    channel_title character Youtube channel name
    category_id character Youtube content category id

    Cleaning Script

    Note this is optional, and NOT required. I downloaded the youtube data via httr from the youtube API and an API key.

    library(tidyverse)
    library(tuber)
    library(rvest)
    library(httr)
    
    raw_data <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/superbowl-ads/main/superbowl-ads.csv")
    
    all_ids <-raw_data$youtube_url %>% 
      str_remove_all("https://www.youtube.com/watch") %>% 
      str_remove("\\?v=") %>% 
      str_subset("NA", negate = TRUE)
    
    all_ids
    
    api_key = "SUPER_SECRET_API"
    
    get_youtube_data <- function(ids_in, query_type = "statistics"){
      
      url_in <- modify_url("https://www.googleapis.com/youtube/v3/videos", 
                         query = list(
                           "part" = query_type,
                           "id" = paste(ids_in, collapse=","),
                           "key" = api_key)
      )
      out_content <- content(GET(url_in), as = "parsed", type = "application/json")
      
      if(query_type == "statistics"){
        out_content$items %>% 
          enframe() %>% 
          unnest_wider(value) %>% 
          unnest_wider(statistics) %>% 
          janitor::clean_names()
        
      }
    }
    
    get_youtube_details <- function(ids_in){
      
      url_in <- modify_url("https://www.googleapis.com/youtube/v3/videos", 
                           query = list(
                             "part" = "snippet",
                             "id" = paste(ids_in, collapse=","),
                             "key" = api_key)
      )
      out_content <- content(GET(url_in), as = "parsed", type = "application/json")
      
      out_content$items %>% 
        enframe() %>% 
        unnest_wider(value) %>% 
        unnest_wider(snippet) %>% 
        janitor::clean_names()
    }
    
    all_vid_stats <- list(
      all_ids[1:50],
      all_ids[51:100],
      all_ids[101:150],
      all_ids[151:200],
      all_ids[200:length(all_ids)]
    ) %>% 
      map_dfr(get_youtube_data)
    
    all_vid_details <- list(
        all_ids[1:50],
        all_ids[51:100],
        all_ids[101:150],
        all_ids[151:200],
        all_ids[200:length(all_ids)]
        ) %>% 
      map_dfr(get_youtube_details)
    
    clean_details <- all_vid_details %>% 
      hoist(thumbnails, 
            thumbnail = list("standard", "url")) %>% 
      select(id, published_at, title:category_id, -tags, -thumbnails)
    
    combo_vid <- all_vid_stats %>% 
      left_join(clean_details) %>% 
      select(-name)
    
    all_youtube <- raw_data %>% 
      mutate(id = str_remove(youtube_url, "https://www.youtube.com/watch") %>% 
               str_remove("\\?v=")) %>% 
      left_join(combo_vid) %>% 
      mutate(across(view_count:comment_count, as.integer)) 
    
    write_csv(all_youtube, "2021/2021-03-02/youtube.csv")
    
    all_youtube %>% 
      glimpse()