Selected British Literary Prizes (1990-2022)
This week we are exploring data related to the Selected British Literary Prizes (1990-2022) dataset which comes from the Post45 Data Collective.
“This dataset contains primary categories of information on individual authors comprising gender, sexuality, UK residency, ethnicity, geography and details of educational background, including institutions where the authors acquired their degrees and their fields of study. Along with other similar projects, we aim to provide information to assess the cultural, social and political factors determining literary prestige. Our goal is to contribute to greater transparency in discussions around diversity and equity in literary prize cultures.”
Additional metadata discussion relating to the ethnicity, gender and sexuality, and educational classification variables is available on the Post45 site. Follow them on BlueSky at @post45data.bsky.social, and here on GitHub at @Post45-Data-Collective.
Thank you to Georgios Karamanis for the dataset suggestion!
In relation to ethical considerations, the authors note that…
“All of the information in this dataset is publicly available. Information about a writer’s location, gender identity, race, ethnicity, or education from scholarly and public sources can be sensitive. The data provided here enables the study of broad patterns and is not intended as definitive.”
- In which genres are women, Black, Asian and ethnically diverse writers most likely to be shortlisted and/or awarded?
- Have prizes improved their record on gender and/or ethnic representation in shortlists and awardees?
- Is there a connection between specific educational credentials and/or educational institutions and writers’ chances of being shortlisted or winning?
Thank you to Jen Richmond for curating this week’s dataset.
The Data
# Using R
# Option 1: tidytuesdayR R package
## install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2025-10-28')
## OR
tuesdata <- tidytuesdayR::tt_load(2025, week = 43)
prizes <- tuesdata$prizes
# Option 2: Read directly from GitHub
prizes <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-10-28/prizes.csv')# Using Python
# Option 1: pydytuesday python library
## pip install pydytuesday
import pydytuesday
# Download files from the week, which you can then read in locally
pydytuesday.get_date('2025-10-28')
# Option 2: Read directly from GitHub and assign to an object
prizes = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-10-28/prizes.csv')# Using Julia
# Option 1: TidierTuesday.jl library
## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
using TidierTuesday
# Download files from the week, which you can then read in locally
download_dataset('2025-10-28')
# Option 2: Read directly from GitHub and assign to an object with TidierFiles
prizes = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-10-28/prizes.csv")
# Option 3: Read directly from Github and assign without Tidier dependencies
prizes = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-10-28/prizes.csv", DataFrame)How to Participate
- Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
- Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
- Submit your own dataset!
PydyTuesday: A Posit collaboration with TidyTuesday
- Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
- Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
- Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.
Data Dictionary
prizes.csv
| variable | class | description |
|---|---|---|
| prize_id | integer | Unique prize identifier used in the SBLP dataset. |
| prize_alias | character | Name of the prize awarded, regularized to the most current name. |
| prize_name | character | Name of the prize awarded, at the time of award. |
| prize_institution | character | Institution that sponsored the prize. |
| prize_year | integer | Year the prize was awarded. |
| prize_genre | character | Genre category of book that the prize was awarded to. |
| person_id | character | Unique author identifier used in the SBLP dataset, assigned in order of entity entry to the dataset. |
| person_role | character | Whether author was shortlisted or won the prize. |
| last_name | character | Family name of author. |
| first_name | character | Given name of author. |
| name | character | Full name author in family name, given name format. |
| gender | character | Author’s gender, as self-declared/publicly available. |
| sexuality | character | Author’s sexuality, as self-declared/publicly available. |
| uk_residence | logical | Whether the author holds residence status in the UK at the time of data gathering. |
| ethnicity_macro | character | Ethnicity macro category, as created for this dataset. |
| ethnicity | character | Ethnicity as self-declared/publicly available. |
| highest_degree | character | Highest level of post-secondary education. |
| degree_institution | character | Institution from which the highest degree was attained. |
| degree_field_category | character | Degree macro category, as created for this dataset. |
| degree_field | character | Field of study, as self-declared/publicly available. |
| viaf | character | Virtual internet authority file code. |
| book_id | character | Unique book identifier used in the SBLP dataset. |
| book_title | character | Title of the awarded or shortlisted book. |
Cleaning Script
# Data obtained from Post45 Data Collective Github, no cleaning necessary
prizes <- readr::read_csv("https://raw.githubusercontent.com/Post45-Data-Collective/data/refs/heads/main/british_literary_prizes/british_literary_prizes-1990-2022.csv")