TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Billboard Hot 100 Number Ones
      • The Data
      • How to Participate
        • PydyTuesday: A Posit collaboration with TidyTuesday
      • Data Dictionary
        • billboard.csv
        • topics.csv
      • Cleaning Script

    Billboard Hot 100 Number Ones

    This week we are exploring the Billboard Hot 100 Number Ones Database. This workbook contains substantial data about every song to ever top the Billboard Hot 100 between August 4, 1958 and January 11, 2025. It was compiled by Chris Dalla Riva as he wrote the book Uncharted Territory: What Numbers Tell Us about the Biggest Hit Songs and Ourselves. It also often powers his newsletter Can’t Get Much Higher.

    7 years ago, I decided that I was going to listen to every number one hit. Along the way, I tracked an absurd amount of information about each song. Using that information, I wrote a data-driven history of popular music covering 1958 through today.

    • Have #1 hits become shorter over time?
    • Does the relation between artist age and chart success change across time?
    • Which keys are most common in #1 hits? Do our key preferences differ by genre?
    • What lyrical topics have dominated #1 hits across different decades?
    • How has the prevalence of explicit content changed over time?

    Thank you to Jen Richmond (R-Ladies Sydney) for curating this week’s dataset.

    The Data

    # Using R
    # Option 1: tidytuesdayR R package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2025-08-26')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2025, week = 34)
    
    billboard <- tuesdata$billboard
    topics <- tuesdata$topics
    
    # Option 2: Read directly from GitHub
    
    billboard <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/billboard.csv')
    topics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/topics.csv')
    # Using Python
    # Option 1: pydytuesday python library
    ## pip install pydytuesday
    
    import pydytuesday
    
    # Download files from the week, which you can then read in locally
    pydytuesday.get_date('2025-08-26')
    
    # Option 2: Read directly from GitHub and assign to an object
    
    billboard = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/billboard.csv')
    topics = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/topics.csv')
    # Using Julia
    # Option 1: TidierTuesday.jl library
    ## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
    
    using TidierTuesday
    
    # Download files from the week, which you can then read in locally
    download_dataset('2025-08-26')
    
    # Option 2: Read directly from GitHub and assign to an object with TidierFiles
    
    billboard = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/billboard.csv")
    topics = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/topics.csv")
    
    # Option 3: Read directly from Github and assign without Tidier dependencies
    billboard = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/billboard.csv", DataFrame)
    topics = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/topics.csv", DataFrame)

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
    • Submit your own dataset!

    PydyTuesday: A Posit collaboration with TidyTuesday

    • Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
    • Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
    • Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.

    Data Dictionary

    billboard.csv

    variable class description
    song character Song Name
    artist character Artist Name
    date date First Week to Hit Number One
    weeks_at_number_one numeric Collective (Consecutive and Non-Consecutive) Weeks at Number One
    non_consecutive numeric Dummy for if it were number one in non-consecutive weeks
    rating_1 numeric Rating between 1 and 10, inclusive, provided by judge 1
    rating_2 numeric Rating between 1 and 10, inclusive, provided by judge 2
    rating_3 numeric Rating between 1 and 10, inclusive, provided by judge 3
    overall_rating numeric Sample mean of Rating 1, Rating 2, and Rating 3
    divisiveness numeric Average absolute pairwise distance between all ratings, ranging from 0.0 to 6.0, inclusive
    label character Record Label that released the song
    parent_label character The larger label or entity that owns the label that released the song, if any
    cdr_genre character Genre as assigned by Chris Dalla Riva and Vinnie Christopher.
    cdr_style character Sub-genre as assigned by Chris Dalla Riva and Vinnie Christopher.
    discogs_genre character Genre as assigned by Discogs.com. See https://blog.discogs.com/en/genres-and-styles/ for more information.
    discogs_style character Style as assigned by Discogs.com. Style is Discogs’ equivalent of sub-genre. If there is no style listed, “None” is listed. See https://blog.discogs.com/en/genres-and-styles/ for more information.
    artist_structure numeric 0 means it is a group of three or more people. 1 means it is a solo act. 2 means it is a duo. If the number is followed by 0.5, then it means that there is at least one featured artist listed.
    featured_artists character List of notable featured artists, along with what they did on the track, whether they were credited or not.
    multiple_lead_vocalists numeric Dummy for if the song contains multiple people singing/rapping the lead vocal
    group_named_after_non_lead_singer numeric Dummy for if the group is named after someone who isn’t the lead singer (e.g., The J. Geils Band is named for the lead guitar player, not the singer)
    talent_contestant character Notes if the artist became well known through a television talent competition. If so, the talent competition is listed (e.g., American Idol)
    posthumous numeric Dummy for if the artist were dead when the song got to number one
    artist_place_of_origin character The country that the artist was born in.
    front_person_age numeric Age of the frontperson or bandleader on the song. If there are multiple, the average is taken. This is not necessarily the lead singer. If blank, then the age(s) could not be accurately located or age did not make sense (i.e., the band was animated).
    artist_male numeric Dummy for if the artist were a male or a group of males at the time of the release. If 0, the artist or group was all female. If 1, the artist or group was all male. If 2, the artist has a mix of males and females. If 3, the artist contains at least one non-binary person.
    artist_white numeric Dummy for if the artist was white, meaning that their ancestry was of European origin. If 0, the artist was not white. If 1, the artist was white. If 2, the artist has members that are both white and not white.
    artist_black numeric Dummy for if the artist was black, meaning that their ancestry was of African origin. If 0, the artist was not black. If 1, the artist was black. If 2, the artist has members that are both black and not black.
    songwriters character Songwriters from BMI/ASCAP Songview database. If they do not have the information, it comes from The Billboard Book of Number One Hits by Fred Bronson. If Bronson does not have the information, it comes from Spotify song credits.
    songwriters_w_o_interpolation_sample_credits character Songwriters from “Songwriters” field, excluding credits for writers who are being interpolated or sampled.
    songwriter_male numeric Dummy for if the songwriter were a male or a group of males at the time of the release. If 0, the songwriter or group of songwriters was all female. If 1, the songwriter or group of songwriters was all male. If 2, the songwriters were a mix of males and females. If 3, the songwriters were a mix of males, females, and non-binary persons.
    songwriter_white numeric Dummy for if the songwriteres were white, meaning that their ancestry was of European origin. If 0, the songwriters were not white. If 1, the songwriters were white. If 2, the songwriters has members that are both white and not white.
    artist_is_a_songwriter numeric Dummy for if the Artist is one of the songwriters
    artist_is_only_songwriter numeric Dummy for if the Artist is the only songwriter
    producers character Producers from Tidal’s production credits. If Tidal doesn’t have the information, it comes from Fred Bronson’s The Billboard Book of Number One Hits. If Bronson does not have the information, it comes from Spotify production credits.
    producer_male numeric Dummy for if the producer were a male or a group of males at the time of the release. If 0, the producer or producers were all female. If 1, the producer or producers were all female. If 2, the producers were a mix of males and females. If 3, the songwriters were a mix of males, females, and non-binary persons.
    producer_white numeric Dummy for if the producers were white, meaning that their ancestry was of European origin. If 0, the producers were not white. If 1, the producers were not white. If 2, the producers were composed of both white and not white persons.
    artist_is_a_producer numeric Dummy for if the Artist is one of the producers
    artist_is_only_producer numeric Dummy for if the artist is the only producer
    songwriter_is_a_producer numeric Dummy for if one of the songwriters is one of the producers
    time_signature character Time Signature
    keys character Key that most captures a song. Key changes are seperated by a semi-colon. A percent sign (%) indicates that the song never returns to its original key. An ampersand (&) indicates an energy key change, or a key change that takes the song either up one or two half steps.
    simplified_key character If the song has a single key, it is again listed here. If it has mutliple keys “Multiple Keys” is listed unless the only key change is an energy key change. In that case, the first key is listed.
    bpm numeric Beats per minute as provided by Spotify
    energy numeric Energy measure from 0 to 100 as provided by Spotify
    danceability numeric Danceability measure from 0 to 100 as provided by Spotify
    happiness numeric Happiness measure from 0 to 100 as provided by Spotify
    loudness_d_b numeric Loudness measured in decibels as provided by Spotify
    acousticness numeric Probability between 0 and 100 that a song is acoustic from Spotify
    vocally_based numeric Dummy for if the song is largely based around a vocal. This does not indicate if the song has a vocal, just if the arrangement is largely fleshed out by human voice, like “Don’t Worry Be Happy” by Bobby McFerrin or “Blue Moon” by The Marcels. Please note that the vocal can be a sample (e.g., “Slow Jamz” by Kanye West)
    bass_based numeric Dummy for if the song is largely based around a bass guitar or synth. This does not indicate if the song has a bass, just if the arrangement is largely fleshed out by bass.
    guitar_based numeric Dummy for if the song is largely based around a guitar. This does not indicate if the song has a guitar, just if the arrangement is largely fleshed out by guitar.
    piano_keyboard_based numeric Dummy for if the song is largely based around a piano or a keyboard. This does not indicate if the song has a piano or keyboard, just if the arrangement is largely fleshed out by one of those instruments.
    orchestral_strings numeric Dummy for if the song contains a string section
    horns_winds numeric Dummy for if the song contains a horn and/or wind section
    accordion numeric Dummy for if the song contains an accordion
    banjo numeric Dummy for if the song contains a banjo
    bongos numeric Dummy for if the song contains bongos
    clarinet numeric Dummy for if the song contains a clarinet
    cowbell numeric Dummy for if the song contains a cowbell
    falsetto_vocal numeric Dummy for if the song contains a falsetto vocal
    flute_piccolo numeric Dummy for if the song contains a flute or piccolo
    handclaps_snaps numeric Dummy for if the song contains handclaps or snaps
    harmonica numeric Dummy for if the song contains a harmonica
    human_whistling numeric Dummy for if the song contains human whistling
    kazoo numeric Dummy for if the song contains a kazoo
    mandolin numeric Dummy for if the song contains a mandolin
    pedal_lap_steel numeric Dummy for if the song contains a pedal or lap steel
    ocarina numeric Dummy for if the song contains an ocarina
    saxophone numeric Dummy for if the song contains a saxophone
    sitar numeric Dummy for if the song contains a sitar
    trumpet numeric Dummy for if the song contains a trumpet
    ukulele numeric Dummy for if the song contains a ukulele
    violin numeric Dummy for if the song contains a violin
    sound_effects character Notes if the song contains any pre-recorded sound effects and, if so, what they are.
    song_structure character Captures the general structure of the song. ‘A1’ means only verses with no refrain. ‘A2’ means verses with a refrain at the beginning or end. ‘A3’ means a lyrical intro and then verses with a refrain at the beginning or end. ‘A4’ means a lyrical intro and then verses with no refrain. ‘C1’ means verse and chorus. ‘C2’ means a lyrical intro, verse, and chorus. ‘C3’ means verse, pre-chorus, chorus. ‘C4’ means verse, pre-chorus, chorus, post-chorus. ‘C5’ means verse, chorus, post-chorus. ‘C6’ means intro, verse, pre-chorus, chorus. ‘C7’ means intro, verse, pre-chorus, chorus, post-chorus. ‘D1’ means verse with a refrain at the beginning or end and a bridge. ‘D3’ means a lyrical intro then a verse with a refrain at the beginning or end and then a bridge. ‘E1’ means verse, chorus, and bridge. ‘E2’ means verse with a refrain at the beginning or end then a chorus and a bridge. ‘E3’ means a verse, pre-chorus, chorus, and bridge. ‘E4’ means a verse, pre-chorus, chorus, post-chorus, and bridge. ‘E5’ means a verse, chorus, post-chorus, and bridge. ‘E6’ means an Intro, Verse, Chorus, Bridge. ‘E7’ means verse with a refrain at the beginning or end then a pre-chorus, chorus, and bridge. ‘F’ means 4 or more sections. ‘I’ means it is an instrumental. If a structure has a ‘V’ at the end, it means that it is a non-rap song with a single rap verse.
    rap_verse_in_a_non_rap_song numeric Dummy for if the song contains at least one rapped verse despite not being a rap song
    length_sec numeric Song length in seconds
    instrumental numeric Dummy for if it is an instrumental. Note that if a song contains little to no vocals, it is still considered an instrumental (e.g., “Harlem Shake” by Baauer)
    instrumental_length_sec numeric Length of time, in seconds, that does not contain a vocalist singing lyrics
    intro_length_sec numeric Length of time, in seconds, before the first verse or chorus at the beginning of the song. Hooks are classified as introductions even if they repeat throughout.
    vocal_introduction numeric Dummy for if a song contains a vocal introduction, meaning a vocal section that appears at the beginning of the song and then not again.
    free_time_vocal_introduction numeric Dummy for if a song contains a free time vocal introduction, meaning a vocal section with tempo rubato that appears at the beginning of the song and then not again.
    fade_out numeric Dummy for if the song fades out
    live numeric Dummy for if the song is a live recording from a concert
    cover numeric Dummy for if the song is a cover, meaning the artist did not write the song and they were not the first person/group to record the song.
    sample numeric Dummy for if the song contains samples the recording of another song. This does not include interpolation/recording of parts of other songs. If a song is an English translation of a song in another language, it is still considered a cover.
    interpolation numeric Dummy for if a song uses a musical or lyrical element from another existing song but re-records it themselves and is not a complete cover.
    inspired_by_a_different_song numeric Dummy for if a song is a cover or contains sampled/interpolated elements
    lyrics character Text of lyrics as provided by http://azlyrics.com. If http://azlyrics.com did not have the lyrics, then http://genius.com is used.
    lyrical_topic character Main topic of lyrics as assigned by Author. For a list of topics, see tab.
    lyrical_narrative numeric Dummy for if the lyrics follow a narrative, meaning a story with a loose beginning, middle, and end
    spoken_word numeric Dummy for if the song contains spoken word lyrics. Note this does not included rapped verses.
    explicit numeric Dummy for if Spotify labels the song as explicit, if the song contains expletives, or if it is overly sexual, violent, or related to drugs for the time of its release. A word is considered an expletive if it is one of the following: ‘ass’, ‘bastard’, ‘bitch’, ‘cock’, ‘cunt’, ‘damn’, ‘dick’, ‘faggot’, ‘fuck’, ‘hell’, ‘piss’, ‘shit’, ‘twat’, all racial epithets, and all derivative words and phrases from this list.
    foreign_language numeric Dummy for if there are any non-English lyrics.
    written_for_a_play numeric Dummy for if the song was originally written for a play
    featured_in_a_then_contemporary_play character Notes if the song was featured in a play around the time it topped the charts. If so, the play is listed. Please note that this does not mean the song was written for the play.
    written_for_a_film numeric Dummy for if the song was originally written for a film
    featured_in_a_then_contemporary_film character Notes if the song was featured in a film around the time it topped the charts. If so, the film is listed. Please note that this does not mean the song was written for the film.
    written_for_a_t_v_show numeric Dummy for if the song was originally written for a T.V. show
    featured_in_a_then_contemporary_t_v_show character Notes if the song was featured in a T.V. show around the time it topped the charts. If so, the T.V. show is listed. Please note that this does not mean the song was written for the T.V. show.
    associated_with_dance numeric Dummy for if the song is known to have inspired people do a certain dance while it plays
    topped_the_charts_by_multiple_artist numeric Dummy for if separate recordings of the song got to number one by different artists
    double_a_side character Notes if the song were considered a double A-sided single. If so, the other side of the record is listed.
    eurovision_entry numeric Dummy for if the song was entered into the annual Eurovision music competition
    u_s_artwork character Description of U.S. Single Artwork. “Cannot Find” means that the artwork could not be reliably located. All artwork before the year 2000 was located on https://discogs.com. Since digital music rose around that time, I begin consulting digital music stores and streaming services for artwork after 2000. If you would like to download the artwork, please follow this link.

    topics.csv

    variable class description
    lyrical_topics character Main topic of the lyrics

    Cleaning Script

    # Mostly clean data by Chris Dalla Riva Billboard Hot 100 Number Ones Database
    # Google Sheet
    
    library(googlesheets4)
    library(janitor)
    
    billboard <- read_sheet(
      "https://docs.google.com/spreadsheets/d/1j1AUgtMnjpFTz54UdXgCKZ1i4bNxFjf01ImJ-BqBEt0/edit?gid=1974823090#gid=1974823090",
      sheet = 2,
      na = c("", "N/A")
    ) %>%
      clean_names() %>% 
      dplyr::mutate(song = unlist(song))
    
    topics <- read_sheet("https://docs.google.com/spreadsheets/d/1j1AUgtMnjpFTz54UdXgCKZ1i4bNxFjf01ImJ-BqBEt0/edit?gid=1974823090#gid=1974823090", sheet = 4) %>%
      clean_names()