Billboard Hot 100 Number Ones
This week we are exploring the Billboard Hot 100 Number Ones Database. This workbook contains substantial data about every song to ever top the Billboard Hot 100 between August 4, 1958 and January 11, 2025. It was compiled by Chris Dalla Riva as he wrote the book Uncharted Territory: What Numbers Tell Us about the Biggest Hit Songs and Ourselves. It also often powers his newsletter Can’t Get Much Higher.
7 years ago, I decided that I was going to listen to every number one hit. Along the way, I tracked an absurd amount of information about each song. Using that information, I wrote a data-driven history of popular music covering 1958 through today.
- Have #1 hits become shorter over time?
- Does the relation between artist age and chart success change across time?
- Which keys are most common in #1 hits? Do our key preferences differ by genre?
- What lyrical topics have dominated #1 hits across different decades?
- How has the prevalence of explicit content changed over time?
Thank you to Jen Richmond (R-Ladies Sydney) for curating this week’s dataset.
The Data
# Using R
# Option 1: tidytuesdayR R package
## install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2025-08-26')
## OR
tuesdata <- tidytuesdayR::tt_load(2025, week = 34)
billboard <- tuesdata$billboard
topics <- tuesdata$topics
# Option 2: Read directly from GitHub
billboard <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/billboard.csv')
topics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/topics.csv')# Using Python
# Option 1: pydytuesday python library
## pip install pydytuesday
import pydytuesday
# Download files from the week, which you can then read in locally
pydytuesday.get_date('2025-08-26')
# Option 2: Read directly from GitHub and assign to an object
billboard = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/billboard.csv')
topics = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/topics.csv')# Using Julia
# Option 1: TidierTuesday.jl library
## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
using TidierTuesday
# Download files from the week, which you can then read in locally
download_dataset('2025-08-26')
# Option 2: Read directly from GitHub and assign to an object with TidierFiles
billboard = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/billboard.csv")
topics = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/topics.csv")
# Option 3: Read directly from Github and assign without Tidier dependencies
billboard = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/billboard.csv", DataFrame)
topics = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/topics.csv", DataFrame)How to Participate
- Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
- Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
- Submit your own dataset!
PydyTuesday: A Posit collaboration with TidyTuesday
- Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
- Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
- Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.
Data Dictionary
billboard.csv
| variable | class | description |
|---|---|---|
| song | character | Song Name |
| artist | character | Artist Name |
| date | date | First Week to Hit Number One |
| weeks_at_number_one | numeric | Collective (Consecutive and Non-Consecutive) Weeks at Number One |
| non_consecutive | numeric | Dummy for if it were number one in non-consecutive weeks |
| rating_1 | numeric | Rating between 1 and 10, inclusive, provided by judge 1 |
| rating_2 | numeric | Rating between 1 and 10, inclusive, provided by judge 2 |
| rating_3 | numeric | Rating between 1 and 10, inclusive, provided by judge 3 |
| overall_rating | numeric | Sample mean of Rating 1, Rating 2, and Rating 3 |
| divisiveness | numeric | Average absolute pairwise distance between all ratings, ranging from 0.0 to 6.0, inclusive |
| label | character | Record Label that released the song |
| parent_label | character | The larger label or entity that owns the label that released the song, if any |
| cdr_genre | character | Genre as assigned by Chris Dalla Riva and Vinnie Christopher. |
| cdr_style | character | Sub-genre as assigned by Chris Dalla Riva and Vinnie Christopher. |
| discogs_genre | character | Genre as assigned by Discogs.com. See https://blog.discogs.com/en/genres-and-styles/ for more information. |
| discogs_style | character | Style as assigned by Discogs.com. Style is Discogs’ equivalent of sub-genre. If there is no style listed, “None” is listed. See https://blog.discogs.com/en/genres-and-styles/ for more information. |
| artist_structure | numeric | 0 means it is a group of three or more people. 1 means it is a solo act. 2 means it is a duo. If the number is followed by 0.5, then it means that there is at least one featured artist listed. |
| featured_artists | character | List of notable featured artists, along with what they did on the track, whether they were credited or not. |
| multiple_lead_vocalists | numeric | Dummy for if the song contains multiple people singing/rapping the lead vocal |
| group_named_after_non_lead_singer | numeric | Dummy for if the group is named after someone who isn’t the lead singer (e.g., The J. Geils Band is named for the lead guitar player, not the singer) |
| talent_contestant | character | Notes if the artist became well known through a television talent competition. If so, the talent competition is listed (e.g., American Idol) |
| posthumous | numeric | Dummy for if the artist were dead when the song got to number one |
| artist_place_of_origin | character | The country that the artist was born in. |
| front_person_age | numeric | Age of the frontperson or bandleader on the song. If there are multiple, the average is taken. This is not necessarily the lead singer. If blank, then the age(s) could not be accurately located or age did not make sense (i.e., the band was animated). |
| artist_male | numeric | Dummy for if the artist were a male or a group of males at the time of the release. If 0, the artist or group was all female. If 1, the artist or group was all male. If 2, the artist has a mix of males and females. If 3, the artist contains at least one non-binary person. |
| artist_white | numeric | Dummy for if the artist was white, meaning that their ancestry was of European origin. If 0, the artist was not white. If 1, the artist was white. If 2, the artist has members that are both white and not white. |
| artist_black | numeric | Dummy for if the artist was black, meaning that their ancestry was of African origin. If 0, the artist was not black. If 1, the artist was black. If 2, the artist has members that are both black and not black. |
| songwriters | character | Songwriters from BMI/ASCAP Songview database. If they do not have the information, it comes from The Billboard Book of Number One Hits by Fred Bronson. If Bronson does not have the information, it comes from Spotify song credits. |
| songwriters_w_o_interpolation_sample_credits | character | Songwriters from “Songwriters” field, excluding credits for writers who are being interpolated or sampled. |
| songwriter_male | numeric | Dummy for if the songwriter were a male or a group of males at the time of the release. If 0, the songwriter or group of songwriters was all female. If 1, the songwriter or group of songwriters was all male. If 2, the songwriters were a mix of males and females. If 3, the songwriters were a mix of males, females, and non-binary persons. |
| songwriter_white | numeric | Dummy for if the songwriteres were white, meaning that their ancestry was of European origin. If 0, the songwriters were not white. If 1, the songwriters were white. If 2, the songwriters has members that are both white and not white. |
| artist_is_a_songwriter | numeric | Dummy for if the Artist is one of the songwriters |
| artist_is_only_songwriter | numeric | Dummy for if the Artist is the only songwriter |
| producers | character | Producers from Tidal’s production credits. If Tidal doesn’t have the information, it comes from Fred Bronson’s The Billboard Book of Number One Hits. If Bronson does not have the information, it comes from Spotify production credits. |
| producer_male | numeric | Dummy for if the producer were a male or a group of males at the time of the release. If 0, the producer or producers were all female. If 1, the producer or producers were all female. If 2, the producers were a mix of males and females. If 3, the songwriters were a mix of males, females, and non-binary persons. |
| producer_white | numeric | Dummy for if the producers were white, meaning that their ancestry was of European origin. If 0, the producers were not white. If 1, the producers were not white. If 2, the producers were composed of both white and not white persons. |
| artist_is_a_producer | numeric | Dummy for if the Artist is one of the producers |
| artist_is_only_producer | numeric | Dummy for if the artist is the only producer |
| songwriter_is_a_producer | numeric | Dummy for if one of the songwriters is one of the producers |
| time_signature | character | Time Signature |
| keys | character | Key that most captures a song. Key changes are seperated by a semi-colon. A percent sign (%) indicates that the song never returns to its original key. An ampersand (&) indicates an energy key change, or a key change that takes the song either up one or two half steps. |
| simplified_key | character | If the song has a single key, it is again listed here. If it has mutliple keys “Multiple Keys” is listed unless the only key change is an energy key change. In that case, the first key is listed. |
| bpm | numeric | Beats per minute as provided by Spotify |
| energy | numeric | Energy measure from 0 to 100 as provided by Spotify |
| danceability | numeric | Danceability measure from 0 to 100 as provided by Spotify |
| happiness | numeric | Happiness measure from 0 to 100 as provided by Spotify |
| loudness_d_b | numeric | Loudness measured in decibels as provided by Spotify |
| acousticness | numeric | Probability between 0 and 100 that a song is acoustic from Spotify |
| vocally_based | numeric | Dummy for if the song is largely based around a vocal. This does not indicate if the song has a vocal, just if the arrangement is largely fleshed out by human voice, like “Don’t Worry Be Happy” by Bobby McFerrin or “Blue Moon” by The Marcels. Please note that the vocal can be a sample (e.g., “Slow Jamz” by Kanye West) |
| bass_based | numeric | Dummy for if the song is largely based around a bass guitar or synth. This does not indicate if the song has a bass, just if the arrangement is largely fleshed out by bass. |
| guitar_based | numeric | Dummy for if the song is largely based around a guitar. This does not indicate if the song has a guitar, just if the arrangement is largely fleshed out by guitar. |
| piano_keyboard_based | numeric | Dummy for if the song is largely based around a piano or a keyboard. This does not indicate if the song has a piano or keyboard, just if the arrangement is largely fleshed out by one of those instruments. |
| orchestral_strings | numeric | Dummy for if the song contains a string section |
| horns_winds | numeric | Dummy for if the song contains a horn and/or wind section |
| accordion | numeric | Dummy for if the song contains an accordion |
| banjo | numeric | Dummy for if the song contains a banjo |
| bongos | numeric | Dummy for if the song contains bongos |
| clarinet | numeric | Dummy for if the song contains a clarinet |
| cowbell | numeric | Dummy for if the song contains a cowbell |
| falsetto_vocal | numeric | Dummy for if the song contains a falsetto vocal |
| flute_piccolo | numeric | Dummy for if the song contains a flute or piccolo |
| handclaps_snaps | numeric | Dummy for if the song contains handclaps or snaps |
| harmonica | numeric | Dummy for if the song contains a harmonica |
| human_whistling | numeric | Dummy for if the song contains human whistling |
| kazoo | numeric | Dummy for if the song contains a kazoo |
| mandolin | numeric | Dummy for if the song contains a mandolin |
| pedal_lap_steel | numeric | Dummy for if the song contains a pedal or lap steel |
| ocarina | numeric | Dummy for if the song contains an ocarina |
| saxophone | numeric | Dummy for if the song contains a saxophone |
| sitar | numeric | Dummy for if the song contains a sitar |
| trumpet | numeric | Dummy for if the song contains a trumpet |
| ukulele | numeric | Dummy for if the song contains a ukulele |
| violin | numeric | Dummy for if the song contains a violin |
| sound_effects | character | Notes if the song contains any pre-recorded sound effects and, if so, what they are. |
| song_structure | character | Captures the general structure of the song. ‘A1’ means only verses with no refrain. ‘A2’ means verses with a refrain at the beginning or end. ‘A3’ means a lyrical intro and then verses with a refrain at the beginning or end. ‘A4’ means a lyrical intro and then verses with no refrain. ‘C1’ means verse and chorus. ‘C2’ means a lyrical intro, verse, and chorus. ‘C3’ means verse, pre-chorus, chorus. ‘C4’ means verse, pre-chorus, chorus, post-chorus. ‘C5’ means verse, chorus, post-chorus. ‘C6’ means intro, verse, pre-chorus, chorus. ‘C7’ means intro, verse, pre-chorus, chorus, post-chorus. ‘D1’ means verse with a refrain at the beginning or end and a bridge. ‘D3’ means a lyrical intro then a verse with a refrain at the beginning or end and then a bridge. ‘E1’ means verse, chorus, and bridge. ‘E2’ means verse with a refrain at the beginning or end then a chorus and a bridge. ‘E3’ means a verse, pre-chorus, chorus, and bridge. ‘E4’ means a verse, pre-chorus, chorus, post-chorus, and bridge. ‘E5’ means a verse, chorus, post-chorus, and bridge. ‘E6’ means an Intro, Verse, Chorus, Bridge. ‘E7’ means verse with a refrain at the beginning or end then a pre-chorus, chorus, and bridge. ‘F’ means 4 or more sections. ‘I’ means it is an instrumental. If a structure has a ‘V’ at the end, it means that it is a non-rap song with a single rap verse. |
| rap_verse_in_a_non_rap_song | numeric | Dummy for if the song contains at least one rapped verse despite not being a rap song |
| length_sec | numeric | Song length in seconds |
| instrumental | numeric | Dummy for if it is an instrumental. Note that if a song contains little to no vocals, it is still considered an instrumental (e.g., “Harlem Shake” by Baauer) |
| instrumental_length_sec | numeric | Length of time, in seconds, that does not contain a vocalist singing lyrics |
| intro_length_sec | numeric | Length of time, in seconds, before the first verse or chorus at the beginning of the song. Hooks are classified as introductions even if they repeat throughout. |
| vocal_introduction | numeric | Dummy for if a song contains a vocal introduction, meaning a vocal section that appears at the beginning of the song and then not again. |
| free_time_vocal_introduction | numeric | Dummy for if a song contains a free time vocal introduction, meaning a vocal section with tempo rubato that appears at the beginning of the song and then not again. |
| fade_out | numeric | Dummy for if the song fades out |
| live | numeric | Dummy for if the song is a live recording from a concert |
| cover | numeric | Dummy for if the song is a cover, meaning the artist did not write the song and they were not the first person/group to record the song. |
| sample | numeric | Dummy for if the song contains samples the recording of another song. This does not include interpolation/recording of parts of other songs. If a song is an English translation of a song in another language, it is still considered a cover. |
| interpolation | numeric | Dummy for if a song uses a musical or lyrical element from another existing song but re-records it themselves and is not a complete cover. |
| inspired_by_a_different_song | numeric | Dummy for if a song is a cover or contains sampled/interpolated elements |
| lyrics | character | Text of lyrics as provided by http://azlyrics.com. If http://azlyrics.com did not have the lyrics, then http://genius.com is used. |
| lyrical_topic | character | Main topic of lyrics as assigned by Author. For a list of topics, see |
| lyrical_narrative | numeric | Dummy for if the lyrics follow a narrative, meaning a story with a loose beginning, middle, and end |
| spoken_word | numeric | Dummy for if the song contains spoken word lyrics. Note this does not included rapped verses. |
| explicit | numeric | Dummy for if Spotify labels the song as explicit, if the song contains expletives, or if it is overly sexual, violent, or related to drugs for the time of its release. A word is considered an expletive if it is one of the following: ‘ass’, ‘bastard’, ‘bitch’, ‘cock’, ‘cunt’, ‘damn’, ‘dick’, ‘faggot’, ‘fuck’, ‘hell’, ‘piss’, ‘shit’, ‘twat’, all racial epithets, and all derivative words and phrases from this list. |
| foreign_language | numeric | Dummy for if there are any non-English lyrics. |
| written_for_a_play | numeric | Dummy for if the song was originally written for a play |
| featured_in_a_then_contemporary_play | character | Notes if the song was featured in a play around the time it topped the charts. If so, the play is listed. Please note that this does not mean the song was written for the play. |
| written_for_a_film | numeric | Dummy for if the song was originally written for a film |
| featured_in_a_then_contemporary_film | character | Notes if the song was featured in a film around the time it topped the charts. If so, the film is listed. Please note that this does not mean the song was written for the film. |
| written_for_a_t_v_show | numeric | Dummy for if the song was originally written for a T.V. show |
| featured_in_a_then_contemporary_t_v_show | character | Notes if the song was featured in a T.V. show around the time it topped the charts. If so, the T.V. show is listed. Please note that this does not mean the song was written for the T.V. show. |
| associated_with_dance | numeric | Dummy for if the song is known to have inspired people do a certain dance while it plays |
| topped_the_charts_by_multiple_artist | numeric | Dummy for if separate recordings of the song got to number one by different artists |
| double_a_side | character | Notes if the song were considered a double A-sided single. If so, the other side of the record is listed. |
| eurovision_entry | numeric | Dummy for if the song was entered into the annual Eurovision music competition |
| u_s_artwork | character | Description of U.S. Single Artwork. “Cannot Find” means that the artwork could not be reliably located. All artwork before the year 2000 was located on https://discogs.com. Since digital music rose around that time, I begin consulting digital music stores and streaming services for artwork after 2000. If you would like to download the artwork, please follow this link. |
topics.csv
| variable | class | description |
|---|---|---|
| lyrical_topics | character | Main topic of the lyrics |
Cleaning Script
# Mostly clean data by Chris Dalla Riva Billboard Hot 100 Number Ones Database
# Google Sheet
library(googlesheets4)
library(janitor)
billboard <- read_sheet(
"https://docs.google.com/spreadsheets/d/1j1AUgtMnjpFTz54UdXgCKZ1i4bNxFjf01ImJ-BqBEt0/edit?gid=1974823090#gid=1974823090",
sheet = 2,
na = c("", "N/A")
) %>%
clean_names() %>%
dplyr::mutate(song = unlist(song))
topics <- read_sheet("https://docs.google.com/spreadsheets/d/1j1AUgtMnjpFTz54UdXgCKZ1i4bNxFjf01ImJ-BqBEt0/edit?gid=1974823090#gid=1974823090", sheet = 4) %>%
clean_names()