TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Campus Pride Index
      • The Data
      • How to Participate
        • Data Dictionary
    • pride_index.csv
    • pride_index_tags.csv
      • Cleaning Script

    Campus Pride Index

    Happy Pride Month! Check out the {gglgbtq} package for LGBTQ-related themes and color palettes!

    The data this week comes from the Campus Pride Index.

    Since 2007, the Campus Pride Index has been the premier LGBTQ national benchmarking tool for colleges and universities to create safer, more inclusive campus communities. The free online tool allows prospective students, families/parents and those interested in higher education to search a database of LGBTQ-friendly campuses who have come out to improve the academic experience and quality of campus life.

    The website has additional information about each campus, including a more detailed breakdown of their Campus Pride Index rating.

    Is there a relationship between the size of a campus or the surrounding community and its rating? What about the state in which the campus is located?

    The Data

    # Option 1: tidytuesdayR package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2024-06-11')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2024, week = 24)
    
    pride_index <- tuesdata$pride_index
    pride_index_tags <- tuesdata$pride_index_tags
    
    # Option 2: Read directly from GitHub
    
    pride_index <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-06-11/pride_index.csv')
    pride_index_tags <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-06-11/pride_index_tags.csv')

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.

    Data Dictionary

    pride_index.csv

    variable class description
    campus_name character Name of the college or university
    campus_location character Location of the college or university
    rating numeric Campus Pride Index star rating from 1 to 5 (including half-star ratings)
    students numeric Full-time equivalent student population
    community_type character Locale where the campus is located. One of “large urban city” (500k+), “medium city” (100k to 500k), “small city” (25k to 100k), “small town” (10k to 25k), “very small town” (5k to 10k), “rural community” (< 5k)

    pride_index_tags.csv

    variable class description
    campus_name character Name of the college or university
    campus_location character Location of the college or university
    public logical Whether the campus is tagged as public (TRUE or NA)
    private logical Whether the campus is tagged as private (TRUE or NA)
    doctoral logical Whether the campus is tagged as a doctoral/research university (TRUE or NA)
    masters logical Whether the campus is tagged as offering master’s degrees (TRUE or NA)
    baccalaureate logical Whether the campus is tagged as offering baccalaureate degrees (TRUE or NA)
    community logical Whether the campus is tagged as a community college (TRUE or NA)
    residential logical Whether the campus is tagged as having a residential campus (TRUE or NA)
    nonresidential logical Whether the campus is tagged as having a nonresidential campus (TRUE or NA)
    liberal_arts logical Whether the campus is tagged as a liberal arts college (TRUE or NA)
    technical logical Whether the campus is tagged as a technical institute (TRUE or NA)
    religious logical Whether the campus is tagged as having a religious affiliation (TRUE or NA)
    military logical Whether the campus is tagged as a military institute (TRUE or NA)
    hbcu logical Whether the campus is tagged as a historically black college/university (TRUE or NA)
    hispanic_serving logical Whether the campus is tagged as a Hispanic-serving institution (TRUE or NA)
    aapi_serving logical Whether the campus is tagged as an Asian-American- and Pacific-Islander-serving institution (TRUE or NA)
    other_minority_serving logical Whether the campus is tagged as an other-minority-serving institution (TRUE or NA)

    Cleaning Script

    library(tidyverse)
    library(here)
    library(fs)
    library(rvest)
    library(polite)
    
    working_dir <- here::here("data", "2024", "2024-06-11")
    session <- polite::bow(
      "https://campusprideindex.org/searchresults/display/0",
      user_agent = "TidyTuesday (https://tidytues.day, jonthegeek+tidytuesday@gmail.com)",
      delay = 0
    )
    pride_index_rows <- session |> 
      polite::scrape() |> 
      rvest::html_elements(".campus-item")
    
    pride_index_data <- tibble::tibble(
      campus_name = pride_index_rows |> 
        rvest::html_element(".s3 h4 a") |> 
        rvest::html_text2() |> 
        stringr::str_squish(),
      campus_location = pride_index_rows |> 
        rvest::html_element(".s3 p") |> 
        rvest::html_text2() |> 
        stringr::str_squish(),
      rating = pride_index_rows |>
        rvest::html_element(".rating") |>
        purrr::map_dbl(
          \(rating_node) {
            full <- rating_node |> 
              html_elements(".rating-icon-Star-Full") |> 
              length()
            half <- rating_node |> 
              html_elements(".rating-icon-Star-Half") |> 
              length()
            full + half / 2
          }
        ),
      students = pride_index_rows |> 
        rvest::html_element(".s1 .currency") |> 
        rvest::html_text2() |> 
        readr::parse_number() |> 
        as.integer(),
      community_type = pride_index_rows |> 
        rvest::html_element(".s1 + .s2") |> 
        rvest::html_text2() |> 
        stringr::str_remove("\\([^)]+\\)") |> 
        stringr::str_squish() |> 
        tolower(),
      other_data = pride_index_rows |> 
        rvest::html_element(".s3 + .s3") |> 
        rvest::html_text2() |> 
        stringr::str_remove_all("\r\n") |> 
        stringr::str_split("\n")
    )
    
    pride_index_tags <- pride_index_data |> 
      dplyr::select(campus_name, campus_location, other_data) |> 
      tidyr::unnest_longer(other_data) |> 
      dplyr::mutate(
        other_data = dplyr::case_match(
          other_data,
          "Doctoral/Research University" ~ "doctoral",
          "Public/State University" ~ "public",
          "Residential Campus" ~ "residential",
          "Asian American and Pacific Islander Serving Institution" ~ "aapi_serving",
          "Master's College/University" ~ "masters",
          "Baccalaureate College/University" ~ "baccalaureate",
          "Nonresidential Campus" ~ "nonresidential",
          "Community College" ~ "community",
          "Hispanic Serving Institution" ~ "hispanic_serving",
          "Liberal Arts College" ~ "liberal_arts",
          "Private Institution" ~ "private",
          "Religious Affiliation" ~ "religious",
          "Historically Black College/University" ~ "hbcu",
          "Military Institution" ~ "military",
          "Technical Institute" ~ "technical",
          "Other Minority Serving Institution" ~ "other_minority_serving"
        ),
        value = TRUE
      ) |> 
      tidyr::pivot_wider(
        names_from = other_data,
        values_from = value
      ) |> 
      dplyr::select(
        campus_name,
        campus_location,
        public, private, 
        doctoral, masters, baccalaureate, community,
        residential, nonresidential,
        liberal_arts, technical,
        religious, military, 
        hbcu, hispanic_serving, aapi_serving, other_minority_serving
      )
    
    pride_index <- pride_index_data |> 
      dplyr::select(-other_data)
    
    # Save -------------------------------------------------------------------------
    readr::write_csv(
      pride_index,
      fs::path(working_dir, "pride_index.csv")
    )
    readr::write_csv(
      pride_index_tags,
      fs::path(working_dir, "pride_index_tags.csv")
    )