I love the maps in Google Analytics and I think the Google Charts API has some great maps rendering. That being said I have a couple of hangups with them:
- Google Analytics is designed for a snap shot view and so won’t give you a heatmap of GROWTH. I think that trends in analytics data are more important than snap shots.
- I want to create my own code to produce the map I want locally without connecting to the internet or relying on Google chart API
Last week I was browsing the Flowing Data website and came across a tutorial into creating a heatmap by county in the USA. I just thought it was totally awesome and it inspired me to do a mini-hackathon at work and produce my own heatmap of countries across the world. Sadly I can’t share with you what I was using it for at work (penetration, weekly, monthly and yearly growth by country for Facebook) so I decided to throw one together using Google Analytics data for one of my sites and produce my own little tutorial (with structure shamelessly stolen from the tutorial I mention above):
Step 0: System Requirements
This was done on windows using command prompt but works on macs or linux too.
- python 2.5+ installed on your machine
- BeautifulSoup installed on your machine (this is an awesome XML parser for python)
- Google Analytics account with a reasonable volume of data (or some other country level data set)
- This svg file (from wikimedia commons)
Step 1: make a couple of edits to the svg file
Open up the SVG file. Within it there are paths and groups of paths. Each path at the top level of the XML tree is a country with a single land mass. Each group is also a country and each path within that group is a land mass for that country (for example ar has 5 land masses in the map in this file):
There was one issue I couldn’t get over/was too lazy to deal with in code so I just hacked it in the file: Somehow beautifulsoup munges the data around greenland (feel free to skip this step and see what happens, you can always fix this later). The way I dealt with this was to go to line 1962 and 1963 and removed them I then went to line 2367 (2369 before the two deletions above) and removed the </g>.
Step 2: create your country level data from Google Analytics
The following graphic shows you where to pull the data by country from your Google Analytics account. I strongly suggest that you set the drop down (found at the bottom right of the page) to 500 so when you download you get a full list of countries and not the default ten countries.
I grabbed the data for all countries from June 28-July 28 2010 and from June 29 – July 29 2009. There are two bits of data processing that need to be done to this data:
- map the 2009 data to 2010 data by country and calculate the delta % (I did this in excel) and save it to a file called “growthratebycountry.csv”
- map the country names that analytics gives you to 2 letter country codes used in the svg file (I have included this list as a csv file in the zip file I give you at the end of this tutorial)
Step 3: create your python script, I called mine runner.py
that is all :)
Step 4: import needed modules
We will need to import the csv file with the year on year growth rate and we will need to import the svg file then read it into BeautifulSoup to parse it. As such we have to import the csv and BeautifulSoup modules at the start of our python script.
Step 5: read in your country level data
First you need to set up the penetration dictionary. Then choose the file you want to read in and set the delimiter “growthratebycountry.csv” is the name of the csv file I saved the growth data to and “,” is the delimiter. Finally you need to loop through that and for each row set the key for the penetration dictionary equal to the first column (containing the country 2 letter id) and the value to the second column (containing the % growth)
Step 6: read in your map and load into BeautifulSoup
I saved the edited map above to “countries.svg”.
One thing that’s special about BeautifulSoup is that you need to tell it which tags are self closing and so the 3 tags mentioned in “selfClosingTags=” are the 3 tags in this file that are self closing.
Step 7: set up your country color levels
Flowingdata.com pointed out the awesome service: http://colorbrewer2.org/ which allows you to pick a color list.
Given I am a Facebook boy I decided I like blue for my color palette. I put this into a list, numbering starts from zero so colors = “#9ECAE1”. We will use this later to color in the map.
Step 8: Find all the countries using BeautifulSoup
There are two sets of countries as mentioned above. Those with multiple land masses (enclosed in “g”) and those with single land masses (enclosed in “path”). Beautiful soup has a function “findAll” that allows you to find all occurrences of a certain XML tag. By default recursive is set to True which means that for the xml tag ‘path’ it will find >1000 (i.e. one for each land mass not each country). This is clearly sub-optimal. Setting recursive=True will allow you to pick only the top level nodes which are all countries.
Step 9: Replace the style for every country with the colors you have chosen
Set the basic style you will want each node to have and end it with “fill:”. We will subsequently be adding a hexadecimal color onto the end there based on the growth rate.
Now for each of the paths you selected in Step 8 run through them and update the color if there is a penetration with that id. I have specifically here included “if ‘land’ in p[‘class’]:” since the only paths we want to recolor are those which represent land masses and in this file they always have the class “land XX” (where XX is the country id, e.g. ‘gb’). The style attribute of each node is accessed via “p[‘style’]”.
Finally we need to go through all the countries represented by groups of land masses and update the color for all of their individual land masses:
The part at the end “for t in g.findAll(‘path’, recursive=True):” loops through ALL the sub-paths within a group and updates their style to match the color that we want the group to appear. Note this is different above where we set “recursive=False”.
Step 10: Output this to file and correct one issue
The last step is to write this out to a file. In this case I have called the file “newfile.svg”. The code str(soup) outputs soup as a string. You have various other options here such as soup.prettify() which you can look up yourself but I prefer this (for no particular reason). The “.replace(‘viewbox’, ‘viewBox’, 1)” is there since BeautifulSoup converts all attributes to lower case when rendering. Unfortunately svg requires that viewBox is capitalized as below and so I have found to get the maps to render correctly in firefox you need to make this replacement.
Step 11: run the script
All the files involved here: “countries.svg”, “growthratebycountry.csv” and “runner.py” need to be in the same directory since I have only used relative locations of paths and not absolute. Once that is the case, navigate to the folder they are in and run “runner.py” using the command “python runner.py”.
Step 12: View your beautiful “newfile.svg”
As you can see I have a lot of growth in the Ukraine :) and since I have limited this to only countries with >100 monthly visits you can also get an indication of the reach of my website worldwide.
Anyway I am really proud of this and am spending all my time at the moment making country level heatmaps for everything I can at work. I hope you can now too :) and if you like seeing a growth based view of your google analytics data try my new application “fast moving keywords”. I definitely intend to include this map in it going forward, once I have worked out how to output an svg as a png using python!
You can always make a cocktail to celebrate when you finish this too using the cocktail site analyzed above.