Importing global weather data since 1850 into Tableau

Optimizing your supply chain is a never ending task. Recently a client asked us to explore localized weather patterns with Tableau. This article shows how we were able to massage the data before importing it into Tableau.

We will use the NOAA (National Centers for Environmental Information) and GSOM (Global Summary of the Month) weather data into Tableau to do some analytics.

https://www.ncdc.noaa.gov/cdo-web/datasets/GSOM/stations/GHCND:USW00026617/detail

Data

The data from NOAA comes in the forms of roughly 90,000 individual csv files, each file representing a single weather station whereby the data can go back to as far as the 1850’s

Two master data files, stations and countries will be used to join additional information such as the longitude and latitude and the country names which will enable you to plot the data on a map in Tableau

Sample weather station data

Descriptions of all the field data can be found here https://www1.ncdc.noaa.gov/pub/data/cdo/documentation/gsom-gsoy_documentation.pdf

Tools

The first thing to consider is that using Tableau to connect straight to the data will be very difficult because of the sheer volume of csv files that you would need to connect to. To integrate all the data you need to use Dataverse which is provided for free up to 2 million records. The tool also plays well with Tableau in that it can export the final result directly to a TDE.

Download a free version of Dataverse.

Dataverse provides a graphical drag and drop environment. similar to Tableau, to create data flows and blend your business data together into appropriate structures for analysis. Its iterative approach is well suited to analytics projects and the interface allows for easy validation of business rules and results.

Dataflow

Weather Dataverse Complete Workflow

The data flow will firstly get all the file names for the files from Spain (starting with SP) as well as from South Africa. It will then combine all the file names into one list and extract the information in the files into memory.

The next section is the transform, where I get rid of the columns that are not needed and also change the date value in the file so that it includes day. Tableau is not able to work with dates that do not have year month and day.

Next the station data and country data are joined in, and finally I export all the data into a TDE so that it can be used by Tableau.

Installation

After you have installed the latest version of Dataverse, you need to:

  1. Download the dataflow file for Dataverse here
  2. Once you open Dataverse, import the file
  3. Download the GSOM data files. A zip file of about 800 mb can be found here. This Zip file also contains the already loaded Tableau TDE and a sample Tableau workbook
  4. Extract the Zip file

Because the files will most likely be stored in a separate folder on your machine then the one that was used to create the work flow, you simply need to update the path for some of the connectors. To do so, click on them and the panel on the right which will pull up the properties dialog. The path will be listed here which you can update to your own.

The connectors to update are:

  1. Spain Data (Pulls in files only for Spain)
  2. SA Data (Pulls in files for only South Africa)
  3. Station Data
  4. Country Data
  5. Output Tableau (Path where your Tableau TDE will be written to)

Update File Paths

Viola! You are now able to run all the whole data flow to output your TDE

What Next?

The dataflow that was created only pulls in data for Spain and South Africa, if you would like to see weather data for other countries, open the countries csv in the countries folder and find the country ID. In either the South Africa or Spain connectors, change the pattern from SP*.csv to NO*.csv to get all Norway weather data.

You now have the worlds historic weather data at your fingertips! If you have any questions please let me know in the comments section and I’ll try and help out where I can.

Leave a Reply