Bird Sightings at Sea
This week we’re exploring Bird Sightings at Sea! The data this week comes from Te Papa Tongarewa, The Museum of New Zealand. It consists of log book entries of bird sightings at sea near New Zealand, from 1969 to 1990.
Thank you to David Hood for the dataset suggestion.
The data was recorded using guidelines for the Australasian Seabird Mapping Scheme and counts seabirds seen from a ship during a 10 minute period. The data includes geolocations of the sightings, bird species, numbers and behaviour, observer and ship name, and observation date and time.
- The data was recorded by hand and split into standardized columns. Do the logbook entries in
species_common_namein thebirdsdataset always match up with the split columns such asspecies_scientific_name,age,wan_plumage_phase,plumage_phase, andsex.
Thank you to Jon Harmon, Data Science Learning Community for curating this week’s dataset.
The Data
# Using R
# Option 1: tidytuesdayR R package
## install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2026-04-14')
## OR
tuesdata <- tidytuesdayR::tt_load(2026, week = 15)
beaufort_scale <- tuesdata$beaufort_scale
birds <- tuesdata$birds
sea_states <- tuesdata$sea_states
ships <- tuesdata$ships
# Option 2: Read directly from GitHub
beaufort_scale <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/beaufort_scale.csv')
birds <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/birds.csv')
sea_states <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/sea_states.csv')
ships <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/ships.csv')# Using Python
# Option 1: pydytuesday python library
## pip install pydytuesday
import pydytuesday
# Download files from the week, which you can then read in locally
pydytuesday.get_date('2026-04-14')
# Option 2: Read directly from GitHub and assign to an object
beaufort_scale = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/beaufort_scale.csv')
birds = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/birds.csv')
sea_states = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/sea_states.csv')
ships = pandas.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/ships.csv')# Using Julia
# Option 1: TidierTuesday.jl library
## Pkg.add(url="https://github.com/TidierOrg/TidierTuesday.jl")
using TidierTuesday
# Download datasets for the week, and load them as a NamedTuple of DataFrames
data = tt_load("2026-04-14")
# Option 2: Read directly from GitHub and assign to an object with TidierFiles
beaufort_scale = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/beaufort_scale.csv")
birds = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/birds.csv")
sea_states = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/sea_states.csv")
ships = read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/ships.csv")
# Option 3: Read directly from Github and assign without Tidier dependencies
beaufort_scale = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/beaufort_scale.csv", DataFrame)
birds = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/birds.csv", DataFrame)
sea_states = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/sea_states.csv", DataFrame)
ships = CSV.read("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-04-14/ships.csv", DataFrame)How to Participate
- Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a Quarto report, a shiny app, or some other piece of data-science-related output, using R, Python, or another programming language.
- Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
- Submit your own dataset!
PydyTuesday: A Posit collaboration with TidyTuesday
- Exploring the TidyTuesday data in Python? Posit has some extra resources for you! Have you tried making a Quarto dashboard? Find videos and other resources in Posit’s PydyTuesday repo.
- Share your work with the world using the hashtags #TidyTuesday and #PydyTuesday so that Posit has the chance to highlight your work, too!
- Deploy or share your work however you want! If you’d like a super easy way to publish your work, give Connect Cloud a try.
Data Dictionary
beaufort_scale.csv
| variable | class | description |
|---|---|---|
| wind_speed_class | integer | Beaufort scale class (0–12), corresponding to ships$wind_speed_class. |
| wind_description | ordered | Text description of the wind conditions, ordered from "calm" (class 0) to "hurricane" (class 12). |
| wind_speed_knots_min | integer | Minimum wind speed in knots for this class. |
| wind_speed_knots_max | integer | Maximum wind speed in knots for this class. NA for class 12 (hurricane), which has no upper bound. |
birds.csv
| variable | class | description |
|---|---|---|
| bird_observation_id | integer | Identifier for this bird observation record, corresponding to the original record order in the source data. |
| record_id | integer | Record identifier. Links to ships$record_id. One observation (record_id 1184009, bird_observation_id 974) has no corresponding ship record in the source data. |
| species_common_name | character | The original log entry as recorded in the source data. May encode species, age, sex, and plumage phase in a single string. NA indicates a census period with no birds recorded (converted from the sentinel value "[NO BIRDS RECORDED]" in the source data). |
| species_scientific_name | character | Scientific name, derived from species_common_name by Te Papa staff. May represent a species aggregate if the log entry could not be identified to a single species. |
| species_abbreviation | character | Abbreviated species name (mostly the first three letters of the genus and species), derived from species_common_name by Te Papa staff. |
| age | ordered | Age class, derived from species_common_name. One of "juvenile", "immature", "subadult", or "adult". |
| wan_plumage_phase | ordered | Wandering albatross plumage phase, derived from species_common_name. One of "all brown", "brown plumage breaking", "white patch on wing", "wing patch breaking", or "white". |
| plumage_phase | ordered | Plumage phase for species other than wandering albatross, derived from species_common_name. One of "dark", "intermediate", "light", or "white". |
| sex | factor | Sex, derived from species_common_name. One of "female" or "male". |
| count | integer | Total number of birds counted in this observation. 99999 is used for counts estimated to be over 100,000. |
| n_feeding | integer | Number of birds observed feeding (unspecified whether actively or passively). 99999 is used for counts estimated to be over 100,000. |
| feeding | logical | Whether any birds were observed feeding. |
| n_sitting_on_water | integer | Number of birds sitting on water. |
| sitting_on_water | logical | Whether any birds were sitting on water. |
| n_sitting_on_ice | integer | Number of birds sitting on ice. |
| sitting_on_ice | logical | Whether any birds were sitting on ice. |
| sitting_on_ship | logical | Whether any birds were sitting on the ship. |
| in_hand | logical | Whether any birds were held in hand (i.e., captured). |
| n_flying_past | integer | Number of birds flying past. 99999 is used for counts estimated to be over 100,000. |
| flying_past | logical | Whether any birds were flying past. |
| n_accompanying | integer | Number of birds accompanying the ship (flying alongside). |
| accompanying | logical | Whether any birds were accompanying the ship. |
| n_following_ship | integer | Number of birds following the ship’s wake. |
| following_ship | logical | Whether any birds were following the ship’s wake. |
| moulting | logical | Whether any birds were observed moulting. |
| naturally_feeding | logical | Whether any birds were naturally feeding (i.e., not feeding on ship discards). |
sea_states.csv
| variable | class | description |
|---|---|---|
| sea_state_class | integer | Sea state class (0–6), corresponding to ships$sea_state_class. |
| sea_state_description | ordered | Text description of the sea state, ordered from "calm, glassy" (class 0) to "very rough" (class 6). |
| wave_meters_min | double | Minimum wave height in meters for this class. |
| wave_meters_max | double | Maximum wave height in meters for this class. |
ships.csv
| variable | class | description |
|---|---|---|
| record_id | integer | Record identifier. Links to birds$record_id. |
| date | date | Observation date. |
| time | time | Local time at the start of the 10-minute count. |
| latitude | double | Decimal latitude (negative values indicate southern hemisphere). |
| longitude | double | Decimal longitude. |
| hemisphere | factor | East/West hemisphere: "E" or "W". |
| activity | factor | Ship activity during the count period. One of "steaming, sailing", "dropping trash", "trawling", "oceanography", "potting", "line fishing", "cleaning fish", "stationary", "flying helicopters", or "whaling". Recoded from numeric codes 1–10 in the source data. |
| speed | double | Ship speed in knots. |
| direction | integer | Ship direction in degrees. |
| cloud_cover | ordered | Cloud cover: "clear", "partially cloudy", or "overcast". Recoded from codes 0–2 in the source data. |
| precipitation | factor | Precipitation type: "none", "squalls", "fog", "drizzle", "rain", "showers", "snow showers", or "continuous snow". Recoded from numeric codes in the source data. |
| wind_speed_class | integer | Wind speed on the Beaufort scale (0–12). Join to [beaufort_scale] for descriptions and knot ranges. |
| wind_direction | integer | Wind direction in degrees. |
| air_temperature | double | Air temperature in degrees Celsius. |
| pressure | integer | Atmospheric sea-level pressure in millibars. |
| sea_state_class | integer | Sea state class (0–6). Join to [sea_states] for descriptions and wave height ranges. |
| sea_surface_temperature | double | Sea surface temperature in degrees Celsius. |
| depth | integer | Sea floor depth in meters. |
| observer | factor | Name of the observer, decoded from a 4-letter code in the source data. |
| census_method | ordered | Count method: "partial" indicates a count lasting less than 10 minutes or a casual observation; "full" indicates a complete 10-minute count. Recoded from "P" and "F" in the source data. |
| season | ordered | Southern hemisphere season: "summer", "autumn", "winter", or "spring". Recorded directly in the source data (but likely derived rather than being entered directly in the log book data). |
Cleaning Script
# Clean data provided by the seabirddata package
# (https://jonthegeek.github.io/seabirddata/index.html). Full cleaning scripts
# are available on GitHub at
# https://github.com/jonthegeek/seabirddata/blob/main/data-raw/
beaufort_scale <- seabirddata::beaufort_scale
birds <- seabirddata::birds
ships <- seabirddata::ships
sea_states <- seabirddata::sea_states