Vasco de Gapi: Google Analytics API Explorer

Are you ready to explore the Google Analytics API?

At Juice, we were very excited about the public release of the Google Analytics Data Export API. Our product Concentrate has been running on a hackish home-brew Google Analytics export tool since its release last November, and we were happy to be able to relaunch as a Customer Example of the Google Analytics Data Export API.

Today, we are releasing a new, free tool called Vasco de GAPI. Vasco is a web-based tool for exploring the API, for downloading complex slices of data using the API, and to even automatically generate code that will allow coders easy replication of the API calls in question. Instead of describing it in more detail, I am just going to demo it.

I am going to start with a relatively rare but curious functionality of Google Analytics. I keep track of who wrote each blog using a Google Analytics user-defined setting that is set to the author's name for each specific blog post. Slicing our blog by author can be cool for me as an employee so that I can brag during my yearly review about how many visitors I bring in or what natural search visits we get for free as a result of my posting. For the demo, I'm going to discover the natural keywords that bring traffic to my blogposts on the website.

Let's get started.

The first step is to authenticate using Google's OAuth system.

I select ga:keyword as a dimension.

ga:pageviews is the metric I am interested in. The results will automatically get sorted by the first metric, so I do not need to explicitly specify a sort value.

I set ga:userDefinedValue as a filter, and filter it to saluryasev, and select this last week as a reference point.

Here is the list of parameters that Vasco de GAPI is passing to google.

What are my results?

It turns out that of all my posts, the Google Trends API that I put out about a year ago drives the most natural traffic to our site. Hopefully, this will change with a few more blog posts, but this is still rather interesting data. I could target that specific audience with something Google-trendy. On an unrelated note, a slap to my face was that Zach's name sent fifteen users to my blogposts. Go figure. Sixteen users searched on my last name, and were probably looking for my more popular father.

To get at the rest of the data, I can click the download link at the bottom of the page or, for developers, another link downloads working code that will replicate this exact pull.

Vasco runs using an open source Python gdata wrapper for the API that can be downloaded here. This wrapper is powerful, and I will write another blogpost about it next week. It is plugged into the Google gdata module, and as such allows all forms of authentication available to gdata users, including OAuth, AuthSub, and clientside.

Hopefully, Vasco de GAPI can help all other potential explorers sail smoothly through the API. When it comes to data, Google is just an great company. They have had powerful APIs for most of their major services for years, and while the Analytics API is a latecomer, it actually is more powerful than the analytics interface itself. This sort of openness is something to be envied by all other analytics and web companies in the market.

By the way, please let me know if the explorer theme works well. It was a lot of fun working on a project with a slightly esoteric approach.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

2 comments


May 1, 2009
Toby Murdock said:

really cool.

congrats zach & team. :-)


May 4, 2009
Dirnov said:

Amazing! Not clear for me, how offen you updating your www.juiceanalytics.com.
Dirnov

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Real-World Tufte Graphics in 11 Lines of Code

Check out our followup post that describes how we created a downloadable Windows application or an excel spreadsheet you can use to create these graphics.

One of the troubles with Tufte is the frustrating infeasability of his approach to design for real people in business. One of his recommendations is to use Adobe Illustrator.

Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.

Raise your hand if you have a graphic design assistant at your beck and call. I thought not.

One of the tools we use for rapid prototyping at Juice is NodeBox.

NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a QuickTime movie. NodeBox is free and well-documented.

All true. But it's more helpful to think of NodeBox as a free Adobe Illustrator that you can program in the world's easiest programming language. Oops, here's the right link.

I wanted to see if we could reproduce the following graph from The Visual Display of Quantitative Information, p 158.

Tufte Current Receipts Graphic

Here's the code. It's 11 lines of code if you exclude entering the data and setting things like fonts and colors.

size(500,700)
font('Palatino'); 
fontsize(12)  
stroke(0.4)  # a medium grey for lines
fill(0.2)    # a slightly darker grey for text  

<h1>data = (label, first, last, label-fudge-factor)</h1>

data = [ ('Sweden', 46.9, 57.4, 0., 0.),
         ('Netherlands', 44.0, 55.8, .3, 0.),
         ('Norway', 43.5, 52.2, 0., 0.),
         ('Britain', 40.7, 39.0, 0., 0.),
         ('France', 39.0, 43.4, 0., 0.6),
         ('Germany', 37.5, 42.9, 0., -0.4),
         ('Belgium', 35.2, 43.2, 0., 0.),
         ('Canada', 35.2, 35.8, .8, 0.4),
         ('Finland', 34.9, 38.2, -0.5, 0.),
         ('Italy', 30.4, 35.7, 0.3, -0.3),
         ('United States', 30.3, 32.5, -0.3, 0.),
         ('Greece', 26.8, 30.6, 0.4, 0.),
         ('Switzerland', 26.5, 33.2, -0.2, 0.1),
         ('Spain', 22.5, 27.1, 0., 0.3),
         ('Japan', 20.7, 26.6, 0., 0.), ]

text("Current Receipts of Goverment as a Percentage of "
      "Gross Domestic Product, 1970 and 1979", 20, 70, width=215)
text("1970", WIDTH*.28, HEIGHT*0.03)
text("1979", WIDTH*.68, HEIGHT*0.03)

def ypos(val):
    # calculate a vertical position by scaling between 10% and 90% 
    # of the height of the image
    return HEIGHT * (0.9 - 0.8 * (val - minval) / (maxval - minval))

<h1>find the minimum and maximum values in the range</h1>

alldata = [d[1] for d in data] + [d[2] for d in data]
minval, maxval = min(alldata), max(alldata)

for label, start, end, startfudge, endfudge in data:
    align(RIGHT)
    text(label, 0, ypos(start+startfudge)+4, width=0.25*WIDTH)
    text("%0.1f" % start, 0.25*WIDTH, ypos(start+startfudge)+4, width=0.07*WIDTH)
    align(LEFT)
    text(label, WIDTH*.75, ypos(end+endfudge)+4)
    text("%0.1f" % end, 0.68*WIDTH, ypos(end+endfudge)+4, width=0.07*WIDTH)
    line(WIDTH*.33, ypos(start), WIDTH*.67, ypos(end))

Here's what the result looks like.

Tufte Current Receipts Graphic with NodeBox

We have some great followups to this planned for next week. We'll reimplement this code with the Python Imaging Library, which will open things up for Windows users. We have some great plans for mashing these graphics up with our just released Google Analytics API.

Check out our followup post that describes how we created a downloadable Windows application you can use to create these graphics.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

23 comments | Show all comments only the last 5 are shown


May 18, 2008
Kragen Javier Sitaker said:

Is there a way to get old-style numerals with NodeBox? I suppose you have to find an installed font on your Mac with old-style numerals.

Pradeep's processing.js demo is awesome, but from the screenshot lacks antialiasing. (I'm not yet a Firefox 3 Achiever.)


May 19, 2008
Luke said:

Dude, why reproduce the errors ("fudge factors") in the original?


May 26, 2008
The Dude said:

@Luke: Dude, the fudge factors are not errors. They are there so that the text labels do not overlap.


August 13, 2008
Michael Galloy said:

I made an IDL implementation, the results are <a href="http://michaelgalloy.com/wp-content/uploads/2008/08/receipts.png">here</a>. It wasn't too bad to have it automatically compute the fudge factors (at least in simple cases).


January 29, 2009
Ahem. said:

I think you're missing the point Edward Tufte was making when he made his original chart. Because he took into consideration that the data was all going in the same direction (down) he was able to design a chart where it was pre-planned that there wouldn't be any x's or crossing lines.
(See http://nymag.com/daily/entertainment/2007/06/edward_tufte_and_the_triumph_o.html)

Edward Tufte would find another solution to the data above.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Kaizen and Juice 2.0

Kaizen may be the the art of continuous improvement, but today we’re happy to showcase the art of discontinuous improvement. In one big bang, we’re introducing a new logo, a new website, and a new platform to deliver web services and tools to make your life better.

The new logo is the product of months of pixel pushing and brainstorming. I’ll detail the evolution of the logo in a future post, but for the moment I’ll leave you with a comparison of the old and new logos.

old Juice logo
new Juice logo

The website redesign is an effort to improve the “discoverability” of our site. Good articles were mouldering in the archives. It was hard to find old or popular articles. Search was barely existent. A follow up article will trace the evolution of the site design.

We built the new site using Python and Django. This is a dynamic platform that gives us a lot of power to add new features, tools, and applications. We’re excited about what we will be able to bring you—we have a whiteboard full of ideas just awaiting implementation.

The new site, while better, isn’t perfect. Despite our efforts, there may be links that don’t work or screencasts that neither screen nor cast. We'd love to hear your reaction to the new design. Please leave a comment to tell us what you think or if you find anything that's broken. We'll fix it right away. With your help, we’ll make this site and this community better in a process of continuous improvement—Kaizen.

We've gotten a lot of positive comments about the design. I wanted to thank rockbeatspaper, the web design consultants who worked with us to create this site. A great company and a terrific job.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

30 comments | Show all comments only the last 5 are shown


May 13, 2007
Chris Gemignani said:

There were a few changes to the blog today that should make frequent readers happy.

- The writing page now shows recent posts and comments.
- A few Internet Explorer CSS problems have been cleaned up as well.


May 25, 2007
Jonah said:

A few big complaints:

1) Bookmarked pages no longer work (permalinks changed, no redirects).

2) Can't browse through archives start to finish. There are 21 posts from Jan 2005. I can see one at a time. And can't see more than a few titles ahead.

3) No dates on posts in archives, so it's tricky to know if links are in fact, the archives I'm looking for.

After 20 minutes of looking for a bookmark on animated scatterplots, I stumbled across it: http://www.juiceanalytics.com/writing/2005/6/

Sadly, under the new design, the animation isn't there. Instead I get code: [FLASH] http://www.juiceanalytics.com/flash/tigerwoodsfinal , 440, 430 [/FLASH]

Juice is usually right on the money with presentation. But you have deviated from standards. Blog standards: date based archiving, categorical archiving, (scrolling across all stories in a given archive, abbreviated or full text), and individual archiving.

You've replaced standards with some filing system that pushes the most popular archives into view at the expense of all others.


June 4, 2007
David Parker said:

I've tried to get used to the new look - I have.

The functional layout is fine. However, I miss the hip looking photo banner. And the bold green titles look too squeezed together, heavily aliased and generally cheap and ugly.


August 2, 2007
Jon Peltier said:

I wondered what happened to this blog. The RSS feeds just stopped, but I never got around to visiting the site itself. Finally I found it today from Chris' post in another blog, and discovered that I'd missed several months of discussion. You should have sent out an announcement using the old RSS feed.

My first impressions of the new layout are positive, by the way.


August 15, 2007
kcmarshall said:

I spotted a bug and thought I'd report it.

The post-specific topic links don't work properly. For example, on this post the topics are "Design, Juice, Python".

The Python link is:
http://www.juiceanalytics.com/writing/?/writing/topics/python/
but should be:
http://www.juiceanalytics.com/writing/topics/python/

Regards!
Kevin

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Python Geocoding Help

Yahoo recently released a nifty geocoder API that's free for small (<50,000 lookups per day), non-commercial applications. Rasmus Lerdorf (Yahoo's PHP king) has written a nice introduction to using this geocoder in your PHP apps. In that spirit, here's a cheap and cheerful Python class that we use to geocode addresses.

from xml.dom.minidom import parse 
import urllib  

class Geocoder:
    """ 
    look up an location using the Yahoo geocoding api
    Requires a Yahoo appid which can be obtained at:
    http://developer.yahoo.net/faq/index.html#appid
    Documentation for the Yahoo geocoding api can be found at:
    http://developer.yahoo.net/maps/rest/V1/geocode.html
    """      

def init(self, appid, address_str):
    self.addressstr = addressstr         
    self.addresses = []
    self.resultcount = 0         
    parms = {'appid': appid, 'location': addressstr}

    try:
        url = 'http://api.local.yahoo.com/MapsService/V1/geocode?'+urllib.urlencode(parms)
        # parse the xml contents of the url into a dom
        dom = parse(urllib.urlopen(url))
        results = dom.getElementsByTagName('Result')
        self.result_count = len(results)
        for result in results:
            d = {'precision': result.getAttribute('precision'),
                'warning': result.getAttribute('warning')}

        for itm in result.childNodes:
            # if precision is zip, Address childNode will not exist

        if itm.childNodes:
            d[itm.nodeName] = itm.childNodes[0].data                     
        else:
            d[itm.nodeName] = ''                
        self.addresses.append(d)
    except:
        raise "GeocoderError"      

def repr(self):
    s = "Original address:n%snn"%self.addressstr         
    s += "%d match(s) found:nn"%self.resultcount         
    for addr in self.addresses:
        s += """Match precision: %(precision)s
            Location: (%(Latitude)s,%(Longitude)s)
            %(Address)s
            %(City)s, %(State)s %(Zip)s
        """ % addr         
    return s

if name == "__main__": sample_addresses = ['555 Grove St. Herndon,VA 20170', '1234 Greeley blvd, springfeld, va, 22152', '50009'] for addr in sample_addresses: g = Geocoder('YahooDemo', addr) print '-'*80
print g

All you need to use this is a Yahoo application id.

You now have four different ways to geocode your company's vital address. If you have suggestions or improvements, let us know. This code is public domain.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

1 comment


April 4, 2010
brian said:

your code does not work.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Restoring romance to the sports page

Why do our sports pages look like this?

Instead of this?

Eastern Conference
Atlantic
Nets
76ers
Celtics
Raptors
Knicks
Central
Pistons
Cavaliers
Bucks
Pacers
Bulls
Southeast
Heat
Wizards
Magic
Hawks
Bobcats
Western Conference
Pacific
Suns
Clippers
Lakers
Warriors
Kings
Southwest
Spurs
Mavericks
Grizzlies
Hornets
Rockets
Northwest
Nuggets
Timberwolves
Jazz
SuperSonics
Trail Blazers

Those green and red lines are "sparklines"--a term invented, I believe, by Edward Tufte. They are little, word-size graphics that show a trend more quickly and clearly than one could describe it. In this case, each sparkline shows an NBA's team record throughout the season; a green up bar is a win, and a red down bar is a loss.

In less space than a standard standings listing, we see the sustained excellence of the Pistons, the steadiness of the Spurs and Mavericks, the Raptors recovering from their awful start, the wheels falling off the Pacers, the mystery that is the Nets. These large multiples of small graphics recover some of the romance and drama that is a season.

For a really beautiful example of sparklines applied to sports, look to Tufte's professional example here. If you know Python, Grig Gheorghiu has written a simple tool for generating sparklines.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

10 comments | Show all comments only the last 5 are shown


August 3, 2006
JimJJewett said:

If they can read the table in a newspaper, they can read the graphic.

Some disadvantages that I notice.

(1) It is harder for someone else to read it to you (or OCR it, or index it, or ...).

(2) The sparkline relies on heavily on color, and color newspaper ink costs more.

(3) You don't have a magical number (like .524) to throw around.

(4) It is harder to display multiple types of information. For example, the sparklines above do not display which games were home/road or in-division, so those percentages are lost.


August 3, 2006
Chris said:

Thanks Jim.

Jeremiah McNichols raised a lot of similar points in this post: http://thinkingpictures.blogspot.com/2006/07/sparklines-handle-with-care.html.

I don't want people to take the exact sparkline I'm showing too literally: the sparkline could be redesigned to show the home/road data, for instance. Personally, I think disadvantage #3 matters most.


August 23, 2006
Wayne Frazer said:

As a former sports editor and newspaper publisher, I can almost guarantee that system would never fly mainly for the second reason given by Jim above.

To be able to use spot color without running up astronomical prices, you have to have color running on another page adjacent in the printing process, i.e. 1/8/9/16 in the web printing process. Putting color willy nilly throughout the paper would drive cost through the roof.

Also, space is at a huge premium. While I like the sparkline's ability to convey the momentum of the team, the amount of space it would take to be clearly visible on low-quality newsprint paper would be tremendous, and it doesn't tell any other story than trend.


September 14, 2006
Pete Jelliffe said:

YOu don't need color to show win/loss, you can simply show up down. But while I like the graphic, it's easier to compare relative records and streaks, you can't quote it. You can't rattle off these stats to friends during a conversation.

I would definitely include summry stats at the end like total win/loss, games back and win %.


September 21, 2006
Tom Snider-Lotz said:

I love sparklines, and use them at work. But for the sports page, as a fan, I want to know how many games behind my team is, especially as the end of the season approaches. I want to compare numbers across divisions if wild card slots are at stake.

Sparklines would make a great supplement to the table, but not a replacement. Tufte himself makes a case for using tables when the data warrant it.

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment