TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Carbon Majors Emissions Data
      • The Data
      • How to Participate
        • Data Dictionary
    • emissions.csv
      • Cleaning Script

    Carbon Majors Emissions Data

    This week we’re exploring historical emissions data from Carbon Majors. They have complied a database of emissions data going back to 1854. h/t Data is Plural.

    Carbon Majors is a database of historical production data from 122 of the world’s largest oil, gas, coal, and cement producers. This data is used to quantify the direct operational emissions and emissions from the combustion of marketed products that can be attributed to these entities. These entities include:

    75 Investor-owned Companies, 36 State-owned Companies, 11 Nation States, 82 Oil Producing Entities, 81 Gas Entities, 49 Coal Entities, 6 Cement Entities

    The data spans back to 1854 and contains over 1.42 trillion tonnes of CO2e covering 72% of global fossil fuel and cement emissions since the start of the Industrial Revolution in 1751.

    They share data with low, medium and high levels of granularity. This dataset is the ‘medium’ granularity that contains year, entity, entity type, commodity, commodity production, commodity unit, and total emissions.

    Are there any trends or changes that surprised you?

    The Data

    # Option 1: tidytuesdayR package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2024-05-21')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2024, week = 21)
    
    emissions <- tuesdata$emissions
    
    
    # Option 2: Read directly from GitHub
    
    emissions <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-05-21/emissions.csv')

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.

    Data Dictionary

    emissions.csv

    variable class description
    year double The year of the data point
    parent_entity character The entity to whom the emissions are traced to
    parent_type character The type of parent_entity. Can be one of: investor-owned company, state-owned entity, nation state.
    commodity character Specifies which commodity the production refers to: Oil and NGL, Natural Gas, Anthracite Coal, Bituminous Coal, Lignite Coal, Metallurgical Coal, Sub-Bituminous Coal, Thermal Coal, or Cement.
    production_value double The quantity of production
    production_unit character The unit of production (Oil & NGL - million barrels, Natural Gas - billion cubic feet, Coal - million tonnes, Cement - million tonnes CO2 (see methodology for explanation)). Units - Billion cubic feet per year (Bcf/yr), Million barrels per year (Million bbl/yr), or Million tonnes per year (Million tonnes/yr).
    total_emissions_MtCO2e double The total emissions traced to the ‘parent_entity’ in the ‘year’. Units - million tonnes of carbon dioxide equivalent (MtCO2e).

    Cleaning Script

    No data cleaning. Dataset is emissions_medium_granularity.csv from https://carbonmajors.org/Downloads.