TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • American Idol data
      • The Data
      • How to Participate
        • Data Dictionary
    • auditions.csv
    • eliminations.csv
    • finalists.csv
    • ratings.csv
    • seasons.csv
    • songs.csv
      • Cleaning Script

    American Idol data

    This week we’re exploring American Idol data! This is a comprehensive dataset put together by kkakey.

    There’s so much data! What do you want to know about American Idol? Song choices, TV ratings, characteristics of winners?

    Data in this dataset comes from Wikipedia. Data collected on seasons 1-18 of American Idol.

    The Datasets * songs.csv - songs that contestants sang and competed with on American Idol from seasons 1-18 * auditions.csv - audition, cities, dates, and venues * elimination_chart.csv - eliminations by week. Data availability varies season-to-season based on season length and number of finalists competing * finalists.csv - information on top contestants, including birthday, hometown, and description * ratings.csv - episode ratings and views. * seasons.csv - season-level information, including season winner, runner-up, release dates, and judges

    The Data

    # Option 1: tidytuesdayR package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2024-07-23')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2024, week = 30)
    
    auditions <- tuesdata$auditions
    eliminations <- tuesdata$eliminations
    finalists <- tuesdata$finalists
    ratings <- tuesdata$ratings
    seasons <- tuesdata$seasons
    songs <- tuesdata$songs
    
    # Option 2: Read directly from GitHub
    
    auditions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-07-23/auditions.csv')
    eliminations <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-07-23/eliminations.csv')
    finalists <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-07-23/finalists.csv')
    ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-07-23/ratings.csv')
    seasons <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-07-23/seasons.csv')
    songs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-07-23/songs.csv')

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.

    Data Dictionary

    auditions.csv

    variable class description
    season double Season
    audition_date_start double Start date of audition
    audition_date_end double End date of audition
    audition_city character City where audition took place
    audition_venue character Preliminary location where auditions took place
    episodes character Episode numbers at this audition location
    episode_air_date character Date episode aired
    callback_venue character Filming and callback location where auditions took place
    callback_date_start double Start date of callback audition
    callback_date_end double End date of callback audition
    tickets_to_hollywood double Number of contestants selected from audition to go to Hollywood week
    guest_judge character Name of guest judge at audition

    eliminations.csv

    variable class description
    season double Season Number
    place character Place (or place range) contestant finished in competition
    gender character Gender of contestant
    contestant character Competitor name
    top_36 character Top 36 eliminations
    top_36_2 character Top 36 eliminations (week 2)
    top_36_3 character Top 36 eliminations (week 3)
    top_36_4 character Top 36 eliminations (week 4)
    top_32 character Top 32 eliminations
    top_32_2 character Top 32 eliminations (week 2)
    top_32_3 character Top 32 eliminations (week 3)
    top_32_4 character Top 32 eliminations (week 4)
    top_30 character Top 30 eliminations
    top_30_2 character Top 30 eliminations (week 2)
    top_30_3 character Top 30 eliminations (week 3)
    top_25 character Top 25 eliminations
    top_25_2 character Top 25 eliminations (week 2)
    top_25_3 character Top 25 eliminations (week 3)
    top_24 character Top 24 eliminations
    top_24_2 character Top 24 eliminations (week 2)
    top_24_3 character Top 24 eliminations (week 3)
    top_20 character Top 20 eliminations
    top_20_2 character Top 20 eliminations (week 2)
    top_16 character Top 16 eliminations
    top_14 character Top 14 eliminations
    top_13 character Top 13 eliminations
    top_12 character Top 12 eliminations
    top_11 character Top 11 eliminations
    top_11_2 character Top 11 eliminations (week 2)
    wildcard character Wildcard week eliminations
    comeback logical Comeback week eliminations
    top_10 character Top 10 eliminations
    top_9 character Top 9 eliminations
    top_9_2 character Top 9 eliminations (week 2)
    top_8 character Top 8 eliminations
    top_8_2 character Top 8 eliminations (week 2)
    top_7 character Top 7 eliminations
    top_7_2 character Top 7 eliminations (week 2)
    top_6 character Top 6 eliminations
    top_6_2 character Top 6 eliminations (week 2)
    top_5 character Top 5 eliminations
    top_5_2 character Top 5 eliminations (week 2)
    top_4 character Top 4 eliminations
    top_4_2 character Top 4 eliminations (week 2)
    top_3 character Top 3 eliminations
    finale character Finale eliminations

    finalists.csv

    variable class description
    Contestant character Name of contestant
    Birthday character Contestant’s birthday
    Birthplace character Contestant’s city of birth
    Hometown character Contestant’s hometown
    Description character Description of contestant
    Season double Season

    ratings.csv

    variable class description
    season double Season
    show_number double Episode number in season
    episode character Episode name
    airdate character Date episode aired
    18_49_rating_share character Percentage of adults aged 18-49 estimated to have watched the episode (Nielsen TV ratings).
    viewers_in_millions double Number (in millions) that watched the episode
    timeslot_et character Episode timeslot in Eastern Time
    dvr_18_49 character Percentage of adults aged 18-19 estimated to have watched the episode on DVR
    dvr_viewers_millions character Number (in millions) that watched the episode on DVR
    total_18_49 character Total percentage of adults aged 18-49 estimated to have watched the episode
    total_viewers_millions character Total number of viewers (in millions).
    weekrank character Ranking of episode performance by season
    ref logical Reference
    share character share (unused)
    nightlyrank double Nightly ranking
    rating_share_households character Ranking per share of households.
    rating_share character Ratings share.

    seasons.csv

    variable class description
    season double Season
    winner character Name of winner
    runner_up character Name of runner_up
    original_release character Original air dates
    original_network character Network aired on
    hosted_by character Host’s name
    judges character Name of judges
    no_of_episodes double Episode name
    finals_venue character Venue of finale
    mentor character Name of season mentor

    songs.csv

    variable class description
    season character Season Number
    week character Week date and week description
    order double Order contestants sang in
    contestant character Competitor name
    song character Name of song sung
    artist character Name of song’s artist (imputed if not explicitly listed)
    song_theme character Week theme for songs sung
    result character Contestant’s elimination status for the week

    Cleaning Script

    # Clean data provided by <https://github.com/kkakey/American_Idol>. No cleaning was necessary.
    auditions <- readr::read_csv("https://raw.githubusercontent.com/kkakey/American_Idol/main/metadata/auditions.csv")
    eliminations <- readr::read_csv("https://raw.githubusercontent.com/kkakey/American_Idol/main/metadata/elimination_chart.csv")
    finalists <- readr::read_csv("https://raw.githubusercontent.com/kkakey/American_Idol/main/metadata/finalists.csv")
    ratings <- readr::read_csv("https://raw.githubusercontent.com/kkakey/American_Idol/main/metadata/ratings.csv")
    seasons <- readr::read_csv("https://raw.githubusercontent.com/kkakey/American_Idol/main/metadata/seasons.csv")
    songs <- readr::read_csv("https://raw.githubusercontent.com/kkakey/American_Idol/main/Songs/songs_all.csv")