TidyTuesday
    • About TidyTuesday
    • Datasets
      • 2025
      • 2024
      • 2023
      • 2022
      • 2021
      • 2020
      • 2019
      • 2018
    • Useful links

    On this page

    • Diwali Sales Data
      • The Data
      • How to Participate
        • Data Dictionary
    • diwali_sales_data.csv
      • Cleaning Script

    Diwali Sales Data

    This week is Diwali, the festival of lights! The data this week comes from sales data for a retail store during the Diwali festival period in India. The data is shared on Kaggle by Saad Haroon.

    This week we’re sharing Python data analysis examples! There’s a few out there, but these ones from Brushan Shelke or Vikas Vachheta (see the Diwali_Sales_Analysis.ipynb file for the code) are some data exploration analyses.

    The Data

    # Option 1: tidytuesdayR package 
    ## install.packages("tidytuesdayR")
    
    tuesdata <- tidytuesdayR::tt_load('2023-11-14')
    ## OR
    tuesdata <- tidytuesdayR::tt_load(2023, week = 46)
    
    house <- tuesdata$diwali_sales_data
    
    # Option 2: Read directly from GitHub
    
    house <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-11-14/diwali_sales_data.csv')

    How to Participate

    • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
    • Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
    • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.

    Data Dictionary

    diwali_sales_data.csv

    variable class description
    User_ID double User identification number
    Cust_name character Customer name
    Product_ID character Product identification number
    Gender character Gender of the customer (e.g. Male, Female)
    Age Group character Age group of the customer
    Age double Age of the customer
    Marital_Status double Marital status of the customer (e.g. Married, Single)
    State character State of the customer
    Zone character Geographic zone of the customer
    Occupation character Occupation of the customer
    Product_Category character Category of the product
    Orders double Number of orders made by the customer
    Amount double Amount in Indian rupees spent by the customer

    Cleaning Script

    Data was downloaded from Kaggle, and the Status and unnamed1 columns removed.

    library(tidyverse)
    
    diwali_sales_data <- read_csv("DiwaliSalesData.csv")
    
    diwali_sales_data <- diwali_sales_data %>% select(!(c(Status, unnamed1)))
    
    write_csv(diwali_sales_data, "diwali_sales_data.csv")