TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Bird Sightings at Sea
      • The Data
      • How to Participate
        • PydyTuesday: A Posit collaboration with TidyTuesday
      • Data Dictionary
        • beaufort_scale.csv
        • birds.csv
        • sea_states.csv
        • ships.csv
      • Cleaning Script

    Bird Sightings at Sea

    This week we’re exploring Bird Sightings at Sea! The data this week comes from Te Papa Tongarewa, The Museum of New Zealand. It consists of log book entries of bird sightings at sea near New Zealand, from 1969 to 1990.

    Thank you to David Hood for the dataset suggestion.

    The data was recorded using guidelines for the Australasian Seabird Mapping Scheme and counts seabirds seen from a ship during a 10 minute period. The data includes geolocations of the sightings, bird species, numbers and behaviour, observer and ship name, and observation date and time.

    • The data was recorded by hand and split into standardized columns. Do the logbook entries in species_common_name in the birds dataset always match up with the split columns such as species_scientific_name, age, wan_plumage_phase, plumage_phase, and sex.

    Thank you to Jon Harmon, Data Science Learning Community for curating this week’s dataset.

    The Data

    # Using R
    # Option 1: tidytuesdayR R package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2026-04-14')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2026, week = 15)
    
    beaufort_scale <- tuesdata$beaufort_scale
    birds <- tuesdata$birds
    sea_states <- tuesdata$sea_states
    ships <- tuesdata$ships
    
    # Option 2: Read directly from GitHub
    
    beaufort_scale <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/beaufort_scale.csv')
    birds <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/birds.csv')
    sea_states <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/sea_states.csv')
    ships <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/ships.csv')
    # Using Python
    # Option 1: pydytuesday python library
    ## pip install pydytuesday
    
    import pydytuesday
    
    # Download files from the week, which you can then read in locally
    pydytuesday.get_date('2026-04-14')
    
    # Option 2: Read directly from GitHub and assign to an object
    
    beaufort_scale = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/beaufort_scale.csv')
    birds = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/birds.csv')
    sea_states = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/sea_states.csv')
    ships = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/ships.csv')
    # Using Julia
    # Option 1: TidierTuesday.jl library
    ## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
    
    using TidierTuesday
    
    # Download datasets for the week, and load them as a NamedTuple of DataFrames
    data = tt_load("2026-04-14")
    
    # Option 2: Read directly from GitHub and assign to an object with TidierFiles
    
    beaufort_scale = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/beaufort_scale.csv")
    birds = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/birds.csv")
    sea_states = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/sea_states.csv")
    ships = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/ships.csv")
    
    # Option 3: Read directly from Github and assign without Tidier dependencies
    beaufort_scale = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/beaufort_scale.csv", DataFrame)
    birds = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/birds.csv", DataFrame)
    sea_states = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/sea_states.csv", DataFrame)
    ships = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/ships.csv", DataFrame)

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
    • Submit your own dataset!

    PydyTuesday: A Posit collaboration with TidyTuesday

    • Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
    • Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
    • Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.

    Data Dictionary

    beaufort_scale.csv

    variable class description
    wind_speed_class integer Beaufort scale class (0–12), corresponding to ships$wind_speed_class.
    wind_description ordered Text description of the wind conditions, ordered from "calm" (class 0) to "hurricane" (class 12).
    wind_speed_knots_min integer Minimum wind speed in knots for this class.
    wind_speed_knots_max integer Maximum wind speed in knots for this class. NA for class 12 (hurricane), which has no upper bound.

    birds.csv

    variable class description
    bird_observation_id integer Identifier for this bird observation record, corresponding to the original record order in the source data.
    record_id integer Record identifier. Links to ships$record_id. One observation (record_id 1184009, bird_observation_id 974) has no corresponding ship record in the source data.
    species_common_name character The original log entry as recorded in the source data. May encode species, age, sex, and plumage phase in a single string. NA indicates a census period with no birds recorded (converted from the sentinel value "[NO BIRDS RECORDED]" in the source data).
    species_scientific_name character Scientific name, derived from species_common_name by Te Papa staff. May represent a species aggregate if the log entry could not be identified to a single species.
    species_abbreviation character Abbreviated species name (mostly the first three letters of the genus and species), derived from species_common_name by Te Papa staff.
    age ordered Age class, derived from species_common_name. One of "juvenile", "immature", "subadult", or "adult".
    wan_plumage_phase ordered Wandering albatross plumage phase, derived from species_common_name. One of "all brown", "brown plumage breaking", "white patch on wing", "wing patch breaking", or "white".
    plumage_phase ordered Plumage phase for species other than wandering albatross, derived from species_common_name. One of "dark", "intermediate", "light", or "white".
    sex factor Sex, derived from species_common_name. One of "female" or "male".
    count integer Total number of birds counted in this observation. 99999 is used for counts estimated to be over 100,000.
    n_feeding integer Number of birds observed feeding (unspecified whether actively or passively). 99999 is used for counts estimated to be over 100,000.
    feeding logical Whether any birds were observed feeding.
    n_sitting_on_water integer Number of birds sitting on water.
    sitting_on_water logical Whether any birds were sitting on water.
    n_sitting_on_ice integer Number of birds sitting on ice.
    sitting_on_ice logical Whether any birds were sitting on ice.
    sitting_on_ship logical Whether any birds were sitting on the ship.
    in_hand logical Whether any birds were held in hand (i.e., captured).
    n_flying_past integer Number of birds flying past. 99999 is used for counts estimated to be over 100,000.
    flying_past logical Whether any birds were flying past.
    n_accompanying integer Number of birds accompanying the ship (flying alongside).
    accompanying logical Whether any birds were accompanying the ship.
    n_following_ship integer Number of birds following the ship’s wake.
    following_ship logical Whether any birds were following the ship’s wake.
    moulting logical Whether any birds were observed moulting.
    naturally_feeding logical Whether any birds were naturally feeding (i.e., not feeding on ship discards).

    sea_states.csv

    variable class description
    sea_state_class integer Sea state class (0–6), corresponding to ships$sea_state_class.
    sea_state_description ordered Text description of the sea state, ordered from "calm, glassy" (class 0) to "very rough" (class 6).
    wave_meters_min double Minimum wave height in meters for this class.
    wave_meters_max double Maximum wave height in meters for this class.

    ships.csv

    variable class description
    record_id integer Record identifier. Links to birds$record_id.
    date date Observation date.
    time time Local time at the start of the 10-minute count.
    latitude double Decimal latitude (negative values indicate southern hemisphere).
    longitude double Decimal longitude.
    hemisphere factor East/West hemisphere: "E" or "W".
    activity factor Ship activity during the count period. One of "steaming, sailing", "dropping trash", "trawling", "oceanography", "potting", "line fishing", "cleaning fish", "stationary", "flying helicopters", or "whaling". Recoded from numeric codes 1–10 in the source data.
    speed double Ship speed in knots.
    direction integer Ship direction in degrees.
    cloud_cover ordered Cloud cover: "clear", "partially cloudy", or "overcast". Recoded from codes 0–2 in the source data.
    precipitation factor Precipitation type: "none", "squalls", "fog", "drizzle", "rain", "showers", "snow showers", or "continuous snow". Recoded from numeric codes in the source data.
    wind_speed_class integer Wind speed on the Beaufort scale (0–12). Join to [beaufort_scale] for descriptions and knot ranges.
    wind_direction integer Wind direction in degrees.
    air_temperature double Air temperature in degrees Celsius.
    pressure integer Atmospheric sea-level pressure in millibars.
    sea_state_class integer Sea state class (0–6). Join to [sea_states] for descriptions and wave height ranges.
    sea_surface_temperature double Sea surface temperature in degrees Celsius.
    depth integer Sea floor depth in meters.
    observer factor Name of the observer, decoded from a 4-letter code in the source data.
    census_method ordered Count method: "partial" indicates a count lasting less than 10 minutes or a casual observation; "full" indicates a complete 10-minute count. Recoded from "P" and "F" in the source data.
    season ordered Southern hemisphere season: "summer", "autumn", "winter", or "spring". Recorded directly in the source data (but likely derived rather than being entered directly in the log book data).

    Cleaning Script

    # Clean data provided by the seabirddata package
    # (https://jonthegeek.github.io/seabirddata/index.html). Full cleaning scripts
    # are available on GitHub at
    # https://github.com/jonthegeek/seabirddata/blob/main/data-raw/
    beaufort_scale <- seabirddata::beaufort_scale
    birds <- seabirddata::birds
    ships <- seabirddata::ships
    sea_states <- seabirddata::sea_states