TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Bechdel Test
      • Get the data here
      • Data Dictionary
    • raw_bechdel.csv
    • movies.csv
      • Cleaning Script

    Walmart shelf with DVD of Hunger Games movie

    Bechdel Test

    The data this week comes from FiveThirtyEight and the corresponding article from FiveThirtyEight.

    Audiences and creators know that on one level or another, there’s an inherent gender bias in the movie business — whether it’s the disproportionately low number of films with female leads, the process of pigeonholing actresses into predefined roles (action chick, romantic interest, middle-aged mother, etc.), or the lack of serious character development for women on screen compared to their male counterparts. What’s challenging is quantifying this dysfunction, putting numbers to a trend that is — at least anecdotally — a pretty clear reality.

    One of the most enduring tools to measure Hollywood’s gender bias is a test originally promoted by cartoonist Alison Bechdel in a 1985 strip from her “Dykes To Watch Out For” series. Bechdel said that if a movie can satisfy three criteria — there are at least two named women in the picture, they have a conversation with each other at some point, and that conversation isn’t about a male character — then it passes “The Rule,” whereby female characters are allocated a bare minimum of depth. You can see a copy of that strip here.

    • Bechdel Test data sourced via Bechdeltest.com API.

    raw_bechdel.csv includes data from 1970 - 2020, for ONLY bechdel testing, while the movies.csv includes IMDB scores, budget/gross revenue, and ratings but only from 1970 - 2013.

    Get the data here

    # Get the Data
    
    # Read in with tidytuesdayR package 
    # Install from CRAN via: install.packages("tidytuesdayR")
    # This loads the readme and all the datasets for the week of interest
    
    # Either ISO-8601 date or year/week works!
    
    tuesdata <- tidytuesdayR::tt_load('2021-03-09')
    tuesdata <- tidytuesdayR::tt_load(2021, week = 11)
    
    bechdel <- tuesdata$bechdel
    
    # Or read in the data manually
    
    raw_bechdel <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-03-09/raw_bechdel.csv')
    movies <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-03-09/movies.csv')

    Data Dictionary

    raw_bechdel.csv

    variable class description
    year integer Year of release
    id integer ID of film
    imdb_id character IMDB ID
    title character Title of film
    rating integer Rating (0-3), 0 = unscored, 1. It has to have at least two [named] women in it, 2. Who talk to each other, 3. About something besides a man

    movies.csv

    variable class description
    year double Year
    imdb character IMDB
    title character Title of movie
    test character Bechdel Test outcome
    clean_test character Bechdel Test cleaned
    binary character Binary pass/fail of bechdel
    budget double Budget as of release year
    domgross character Domestic gross in release year
    intgross character International gross in release year
    code character Code
    budget_2013 double Budget normalized to 2013
    domgross_2013 character Domestic gross normalized to 2013
    intgross_2013 character International gross normalized to 2013
    period_code double Period code
    decade_code double Decade Code
    imdb_id character IMDB ID
    plot character Plot of movie
    rated character Rating of movie
    response character Response?
    language character Language of film
    country character Country produced in
    writer character Writer of film
    metascore double Metascore rating (0-100)
    imdb_rating double IMDB Rating 0-10
    director character Director of movie
    released character Released date
    actors character Actors
    genre character Genre
    awards character Awards
    runtime character Runtime
    type character Type of film
    poster character Poster image
    imdb_votes character IMDB Votes
    error character Error?

    Cleaning Script

    library(tidyverse)
    library(jsonlite)
    
    raw_json <- jsonlite::parse_json(url("http://bechdeltest.com/api/v1/getAllMovies"))
    
    all_movies <- raw_json %>% 
      map_dfr(~as.data.frame(.x, stringsAsFactors = FALSE)) %>% 
      rename(imdb_id = imdbid) %>% 
      tibble()
    
    all_movies %>% 
      filter(year >= 1970) 
    
    
    
    cleaned_bechdel <- all_movies %>% 
      mutate(title = case_when(
        str_detect(title, ", The") ~ str_remove(title, ", The") %>% paste("The", .),
        TRUE ~ str_replace(title, "&#39;", "'")
      ))
    
    cleaned_bechdel %>% 
      write_csv("2021/2021-03-09/raw_bechdel.csv")
    
    # IMDB data ---------------------------------------------------------------
    
    
    imdb_json <- jsonlite::parse_json(url("https://raw.githubusercontent.com/brianckeegan/Bechdel/master/imdb_data.json"))
    
    all_imdb <- imdb_json %>%
      map_dfr(~as.data.frame(.x, stringsAsFactors = FALSE))
    
    cleaned_imdb <- all_imdb %>% 
      janitor::clean_names() %>% 
      mutate(metascore = parse_number(metascore),
             imdb_rating = parse_number(imdb_rating),
             year = as.integer(year)) %>% 
      mutate(imdb_id = str_remove(imdb_id, "tt")) %>% 
      tibble()
    
    all_imdb
    
    # 538 Data ----------------------------------------------------------------
    
    movies <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv")
    
    cleaned_movies <- movies %>% 
      mutate(imdb_id = str_remove(imdb, "tt")) 
    
    combo_movies <- cleaned_movies %>% 
      left_join(cleaned_imdb) %>% 
      janitor::clean_names() 
    
    combo_movies
    
    combo_movies %>% 
      write_csv("2021/2021-03-09/movies.csv")