Data Sources and Files

Raw Data - Extract

Data for covid cases, covid deaths and mask use by County and State have been sourced from both New York Times csv hosted on Newspaper's github page while population data by County and State was obtained from US Census csv

Data Cleanup - Transform

Team used Jupyter notebook to cleanup raw data. The cleanup process include droping of columns not needed, joining of columns and filtering of rows. Two Jupyter notebooks were used to clean data see links for details of code

  • cleantables
  • geocodes
  • Production database is relational using postgres and hosted on the AWS.

    Clean Data - Load

    6 relational tables were produced from data cleanup to include primary and foreign keys see link below for csv files.

  • cleantables
  • Data Dictionary

    Definition of data in tables
    Name of Table Column Name Description Data type Character length Accepts null value
    cases_death county_fips Unique code for Counties in the Country Varchar 255 n
    state_fips Unique code for States in the Country Integer 10 n
    cases Number of covid cases in county Integer 10 n
    deaths Number of covid deaths in county Integer 10 n
    county county_fips Name for Counties in the Country Varchar 255 n
    county_name Name of Counties in the Country Varchar 255 n
    county_state county_fips Unique code for Counties in the Country Varchar 255 n
    state_fips Unique code for States in the Country Varchar 255 n
    mask_usage county_fips Unique code for Counties in the Country Varchar 255 n
    state_fips Unique code for States in the Country Integer 10 n
    never Percent of respondents who never wear a mask Varchar 255 n
    rarely Percent of respondents who rarely wear a mask Varchar 255 n
    sometimes Percent of respondents who sometimes wear a mask Varchar 255 n
    frequently Percent of respondents who frequently wear a mask Varchar 255 n
    always Percent of respondents who always wear a mask Varchar 255 n
    population county_fips Unique code for Counties in the Country Varchar 255 n
    state_fips Unique code for States in the Country Integer 10 n
    state_fips Unique code for States in the Country Integer 10 n
    2019_population Population count of County based on 2019 forecast by the US Census Board Integer 10 n
    state state_fips Unique code for States in the Country Integer 10 n
    state_name Name of States in the Country Varchar 255 n