Depth and Discovery: Powering Visualizations with the Google Analytics API

At Juice, we work with web analytics APIs large and small, from Google, comScore and Omniture. The Google Analytics API is our favorite. It powers the world's best, most widely deployed analytics site. And it powers Juice products like Concentrate (innovative search analytics) and Vasco de Gapi (a tool for exploring the Google Analytics API).

We were approached by the Google Analytics API team to find ways to explore new ways of looking at data with the API, and we were excited by the possibilities. We've been working on our own visualization framework, JuiceKit, that integrates the power of the Flare Visualization Library with Adobe Flex.

The result is Analytics Visualizations, two visualizations powered by the Google Analytics API that are free to use. You just need a Google account with access to Google Analytics data to explore your own data.

Analytics Visualizations Home Page

Referrer Flow

Curious about what sites are linking to you and what content is benefitting the most? Referrer Flow answers those question and shows how results change over time. Here is a brief video introduction:

Referrer Flow is a stream of daily treemaps showing pageviews and bounce rates for various groupings of your website's pages. You can group by combinations of page title, referrer and url. Clicking on the treemap will filter all the data by the page, referrer or url that you clicked on. Click again to clear your filter.

Keyword Tree

A list of top keywords isn't enough to really understand how people are searching and finding your site. Keyword Tree visually displays the most frequently used search keywords and how they are used together. Here's a video overview:

You'll see a frequently used search term at the center and the words and phrases that are most often used in combination with that word. Pick a different starting word by typing into the box in the upper right or selecting from the top word across the bottom of the screen. The words are sized by their frequency of use and colored by bounce rate (or % new visitors or average time on site). Roll over a word to see details about that combination of connected words.

Depth and Discovery

In designing these visualizations we focused on the question: how can we let users uncover the unexpected? That means designing targeted visualizations focused on limited well-defined issues. The Referrer Flow monomaniacally focuses on a single question "What pages are people viewing on your site and where are they coming from?" The Keyword Tree is laser-focused on word ordering and what that means for keyword performance.

The Google Analytics reporting tool is a great general-purpose reporting solution. It gives the advanced users everything they need to answer specific questions. However, its generality means it has limited ability to focus on two issues; depth and discovery.

The Google Analytics API is Google's solution to this problem. It's an opportunity both for businesses like ours that can create new ways of analyzing data, and for large sites that can use the API for integration, custom analytics, and more.

Thanks to Nick Mihailovski at Google for his gracious support, help and encouragement and Avinash Kaushik for inspiring this idea.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

8 comments | Show all comments only the last 5 are shown


December 15, 2009
DSLR said:

I'm new with GA and co. but your tool is really useful, at a glance you can read a lot of things...To improve readability of both views (referrer flow in particular) you can add a "loupe", a magnifier on screen movable by mouse to expand details of the chart. Thanks again from Italy!


January 4, 2010
Affan Laghari said:

Hello,
Excellent tool though it doesn't need my praise! It would be very helpful though if you can add an option to select start/end dates and some conversion metric. That can help find valuable patterns over longer periods.
Btw, I found you people from Avinash's blog and have been roaming around on your other tools namely Vasco de Gapi, Concentrate Me and JuiceKit. Rare to find such intelligent tools. Please keep up the good work.


January 4, 2010
yulia said:

Hi guys, found your site through Avinash's blog. I love the keyword tree tool. Been playing with it all day...

Question -- is there a way to print the trees? Also, is there a way to scroll? Those would be nice functionalities... Sorry if they are already there and I'm just too slow to find them :)

Thanks for the great (and really useful) tools!


March 4, 2010
Jean said:

Hello,

Is Juicekit still actively developped ? In the git repo the last commit date is August 30, 2009.

Does juicekit work with flex 4 and Flash Builder 4 beta ?

Hope this fantastic tool will continue to improve.


March 4, 2010
Sal Uryasev said:

Hey Jean,

Juicekit is under very active development, as we actively use it internally for all our work. If you investigate the unstable branch, you will notice a number of new features and improvements. There is also more work on an internal branch that should get merged in. As far as I know, we do not have a stable release ETA, but I know that we want to do one.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Programmatic Google Trends API

Updated October 21, 2009

Yesterday, Google released an update to their popular Google Trends tool. There are improvements over the previous version, but the biggest new feature is a new shiny button that lets you download all your data in the format of a CSV. This is a very cool enhancement. Where Google Trends was a geeky toy, it now takes the leap to integrate into analysts' reports and with that, edge its way onto managerial desks.

This python module is a quasi-API to make it easier to authenticate into Google Trends for those who want to squeeze the extra level of functionality out of their data. The advantage of programmatic access is that the data can be automatically trended and merged. It can be snuck into a 9:00 AM daily email to the VP of Marketing so that she knows to ramp up Google Adwords campaigns for some specific keyword. Also, by programatically pulling multiple reports, it is possible to create a wealth of data not visible in a single report. Using one keyword as a benchmark to merge multiple reports, we can do a meaningful comparison on tens or hundreds of relevant keywords.

To use the pyGTrends, the quasi-Google-Trends-API, you can download the latest version from github.

Here is an example of the most basic basic report that you can pull down from Google Trends. The connector function needs authentication info, and download_report needs to be passed a list of keywords.

from pyGTrends import pyGTrends

connector = pyGTrends('google username','google password')
connector.download_report(('keyword1', 'keyword2'))
print connector.csv()

You can, however, use pyGTrends to get any slice of data that you can pull down from Google Trends. To see the exact parameters that you should use, go to Google Trends, and navigate to the specific sufficiently-narrow report that you are interested in. Then, right-click on the CSV download, and save the link location. The different parameters should be discernible from the link. The following code downloads a report for banana, bread, and bakery keywords from April 2008, originating from the magnificent nation of Austria, and scaled using fixed scaling (aka the second download link).

connector.download_report(('banana', 'bread', 'bakery'), 
                          date='2008-4', 
                          geo='AT', 
                          scale=1)

By default, the csv() function downloads the main part of the report, but there are a few additional parts stuck to the bottom of the CSV file. If you are interested in those, pass the section parameter to the csv() function. The following will return the Language section.

print connector.csv(section='Language')

Full recommended usage includes using either the csv.reader or csv.DictReader module.

from csv import DictReader
print DictReader(connector.csv().split('\n'))

Here is a snapshot from the new Google Trends to add some eye-candy to the post: Google Trends Eye-Candy

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

39 comments | Show all comments only the last 5 are shown


March 2, 2010
Aloysius Adrian said:

@David
I, also, can not get the connect.csv()

I contacted Mr. Uryasev, and he suggested that I try print connect.raw_data

I did that, but the result/output was : "You must be signed in to export data from Google Trends"

I wonder if you also get that kind of output..


April 13, 2010
Eric Wilson said:

I have the same problem as David and Aloysius. If anyone has overcome this, please let me know.


April 13, 2010
D'Artagnan said:

How do you select multiple years? e.g. 2008 2009 I've tried 2008-2009 and 1/2008 12m but no luck...


June 11, 2010
David Drace said:

Thanks so much for this. I've just launched a site that uses this API to build a U.S. map tracking "flea" search activity, indicating flea prevalence. Have a look: http://banfieldfleafighter.com.

To do this, I set up a cron to poll Google Trends once every morning and write a small CSV file. This file gets parsed by PHP and fed to Flash with every visit to the site.

Aside from the very occasional "Cannot parse GALX out of login page" error, it works like a charm.


July 19, 2010
Sherin seo said:

Great post..It will be really helpful in my reporting structuring !!!Can u just update something about Machros???I am little confused about it!!

For sherin--> www.copperbridgemedia.com ..

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Juiced Google Analytics Python API

Due to the release of an official Google Analytics Data Export API, this module is now deprecated. We have an alternative python module based upon the real analytics API here, and an exploring tool with an automatic code generation capability here.

It is not official. It is not from Google. It is, however, very functional and very here. I present to you pyGAPI, the Juiced Google Analytics Python API. This module allows you to pull information from your incarnation of Google Analytics and employ it programatically into your reporting code.

Let us use iPython to peek through some code using pyGAPI.

In [3]: from datetime import date
In [4]: import pyGAPI
In [5]: connector = pyGAPI.pyGAPI(username, password, website_id="1234567")

Here we create a pyGAPI object. Behind the scenes, pyGAPI logs into Google Analytics, and downloads an identifier cookie. website_id is optional. If omitted, pyGAPI accesses the first website on the account's list. To get a list of all the site IDs to which your site has access, run the function connector.list_sites().

In [6]: connector.download_report('KeywordsReport', (date(2008,3,10), date(2008,3,31)), limit=5)

Download a report into your pyGAPI object. KeywordsReport is the name of the report. It is followed by a tuple containing the start and end dates in python date format. limit is an optional parameter that specifies the number of entries that pyGAPI should pull down. By default, it will pull in all the entries up to a maximum of 10000. Lowering this number will certainly improve performance. The entries returned are ranked by Visits, so you should get the most significant values of the bunch.

In [7]: print connector.csv()
Keyword,Visits,Pages/Visit,Avg. Time on Site,% New Visits,Bounce Rate,Visits,Subscribe,Solutions,Goal Conversion Rate,Per Visit Goal Value
juice analytics,356,5.935393258426966,314.061797752809,0.38764044642448425,0.29494380950927734,356,1.0,0.16292135417461395,1.1629213094711304,0.0
excel training,142,1.971830985915493,98.0774647887324,0.908450722694397,0.6901408433914185,142,1.0,0.0211267601698637,1.0211267471313477,0.0
excel charts,77,1.7922077922077921,95.0,0.9090909361839294,0.7792207598686218,77,1.0,0.03896103799343109,1.0389610528945923,0.0
excel skills,72,1.6527777777777777,75.29166666666667,0.9444444179534912,0.7083333134651184,72,1.0,0.0,1.0,0.0
colbert bump,70,1.3142857142857143,113.77142857142857,0.6428571343421936,0.8428571224212646,70,1.0,0.0,1.0,0.0

This function displays your report in a nice excel-ready CSV format.

In [8]: print connector.parse_csv_as_dicts(convert_numbers=True)
[{'Avg. Time on Site': 314.06179775280901, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.29494380950927734, 'Keyword': 'juice analytics', 'Visits': 356.0, 'Pages/Visit': 5.9353932584269664, 'Subscribe': 1.0, 'Solutions': 0.16292135417461395, '% New Visits': 0.38764044642448425, 'Goal Conversion Rate': 1.1629213094711304}, {'Avg. Time on Site': 98.077464788732399, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.69014084339141846, 'Keyword': 'excel training', 'Visits': 142.0, 'Pages/Visit': 1.971830985915493, 'Subscribe': 1.0, 'Solutions': 0.021126760169863701, '% New Visits': 0.90845072269439697, 'Goal Conversion Rate': 1.0211267471313477}, {'Avg. Time on Site': 95.0, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.77922075986862183, 'Keyword': 'excel charts', 'Visits': 77.0, 'Pages/Visit': 1.7922077922077921, 'Subscribe': 1.0, 'Solutions': 0.038961037993431091, '% New Visits': 0.90909093618392944, 'Goal Conversion Rate': 1.0389610528945923}, {'Avg. Time on Site': 75.291666666666671, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.70833331346511841, 'Keyword': 'excel skills', 'Visits': 72.0, 'Pages/Visit': 1.6527777777777777, 'Subscribe': 1.0, 'Solutions': 0.0, '% New Visits': 0.94444441795349121, 'Goal Conversion Rate': 1.0}, {'Avg. Time on Site': 113.77142857142857, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.84285712242126465, 'Keyword': 'colbert bump', 'Visits': 70.0, 'Pages/Visit': 1.3142857142857143, 'Subscribe': 1.0, 'Solutions': 0.0, '% New Visits': 0.6428571343421936, 'Goal Conversion Rate': 1.0}]

This function goes the extra step and converts the CSV into a dictionary for easier programmatic use. By default, all entries will be returned as python strings. Setting convert_numbers to True, as we did here, will additionally parse the dictionary to turn all numbers into float values.

In [9]: print connector.list_reports()
('ReferringSourcesReport', 'SearchEnginesReport', 'AllSourcesReport', 'KeywordsReport', 'CampaignsReport', 'AdVersionsReport', 'TopContentReport', 'ContentByTitleReport', 'ContentDrilldownReport', 'EntrancesReport', 'ExitsReport', 'GeoMapReport', 'LanguagesReport', 'HostnamesReport', 'SpeedsReport')

This gets a list of all the reports that I have successfully tested thus far. All site-specific reports should work. A couple site-section specific reports should be included in the next update of pyGAPI.

Google is great and will release a real API soon, but until then you can download pyGAPI.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

13 comments | Show all comments only the last 5 are shown


August 7, 2008
Ludovic said:

Very nice work. Very useful to, let's say get your most visited pages without having to maintain parallel accounting. May I ask you to licence it to an OSS licence and put it on Google Code ? Would be great.


August 20, 2008
Sebastian said:

Hello,

it work well! Great.
How can i pull the "keyword" or "country" report for a specific URL?
(use segmention)

Thanks


September 5, 2008
Thierry said:

Awesome work !


April 21, 2009
Random said:

There is now an actual analytics API:
http://code.google.com/apis/analytics/docs/gdata/gdataDeveloperGuide.html


April 22, 2009
Sal said:

I wrote a Python API wrapper that I call 'degapi' for the new analytics API to replace this old code. I have yet to put up a post and link about it, but it can be found here: http://suryasev.github.com/python-degapi/

There is an automatic python code generator for this API at http://vascodegapi.juiceanalytics.com

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment