python

Vasco de Gapi: Google Analytics API Explorer

Update: Thanks for checking this out! However, since Google created and now maintains an updated version of a data feed explorer, we have take the Vasco De Gapi application offline.

Are you ready to explore the Google Analytics API?

At Juice, we were very excited about the public release of the Google Analytics Data Export API. Our product Concentrate has been running on a hackish home-brew Google Analytics export tool since its release last November, and we were happy to be able to relaunch as a Customer Example of the Google Analytics Data Export API.

Today, we are releasing a new, free tool called Vasco de GAPI. Vasco is a web-based tool for exploring the API, for downloading complex slices of data using the API, and to even automatically generate code that will allow coders easy replication of the API calls in question. Instead of describing it in more detail, I am just going to demo it.

I am going to start with a relatively rare but curious functionality of Google Analytics. I keep track of who wrote each blog using a Google Analytics user-defined setting that is set to the author’s name for each specific blog post. Slicing our blog by author can be cool for me as an employee so that I can brag during my yearly review about how many visitors I bring in or what natural search visits we get for free as a result of my posting. For the demo, I’m going to discover the natural keywords that bring traffic to my blogposts on the website.

Let’s get started.

The first step is to authenticate using Google’s OAuth system.

I select ga:keyword as a dimension.

ga:pageviews is the metric I am interested in. The results will automatically get sorted by the first metric, so I do not need to explicitly specify a sort value.

I set ga:userDefinedValue as a filter, and filter it to saluryasev, and select this last week as a reference point.

Here is the list of parameters that Vasco de GAPI is passing to google.

What are my results?

It turns out that of all my posts, the Google Trends API that I put out about a year ago drives the most natural traffic to our site. Hopefully, this will change with a few more blog posts, but this is still rather interesting data. I could target that specific audience with something Google-trendy. On an unrelated note, a slap to my face was that Zach’s name sent fifteen users to my blogposts. Go figure. Sixteen users searched on my last name, and were probably looking for my more popular father.

To get at the rest of the data, I can click the download link at the bottom of the page or, for developers, another link downloads working code that will replicate this exact pull.

Vasco runs using an open source Python gdata wrapper for the API that can be downloaded here. This wrapper is powerful, and I will write another blogpost about it next week. It is plugged into the Google gdata module, and as such allows all forms of authentication available to gdata users, including OAuth, AuthSub, and clientside.

Hopefully, Vasco de GAPI can help all other potential explorers sail smoothly through the API. When it comes to data, Google is just an great company. They have had powerful APIs for most of their major services for years, and while the Analytics API is a latecomer, it actually is more powerful than the analytics interface itself. This sort of openness is something to be envied by all other analytics and web companies in the market.

By the way, please let me know if the explorer theme works well. It was a lot of fun working on a project with a slightly esoteric approach.

Real-World Tufte Graphics in 11 Lines of Code

Check out our followup post that describes how we created a downloadable Windows application or an excel spreadsheet you can use to create these graphics.

One of the troubles with Tufte is the frustrating infeasability of his approach to design for real people in business. One of his recommendations is to use Adobe Illustrator.

Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.

Raise your hand if you have a graphic design assistant at your beck and call. I thought not.

One of the tools we use for rapid prototyping at Juice is NodeBox.

NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a QuickTime movie. NodeBox is free and well-documented.

All true. But it’s more helpful to think of NodeBox as a free Adobe Illustrator that you can program in the world’s easiest programming language. Oops, here’s the right link.

I wanted to see if we could reproduce the following graph from The Visual Display of Quantitative Information, p 158.

Tufte Current Receipts Graphic

Here’s the code. It’s 11 lines of code if you exclude entering the data and setting things like fonts and colors.

size

(

500

,

700

)

font

(

'Palatino'

);

fontsize

(

12

)

stroke

(

0.4

)

# a medium grey for lines

fill

(

0.2

)

# a slightly darker grey for text

<

h1

>

data

=

(

label

,

first

,

last

,

label

-

fudge

-

factor

)

h1

>

data

=

[

(

'Sweden'

,

46.9

,

57.4

,

0.

,

0.

),

(

'Netherlands'

,

44.0

,

55.8

,

.

3

,

0.

),

(

'Norway'

,

43.5

,

52.2

,

0.

,

0.

),

(

'Britain'

,

40.7

,

39.0

,

0.

,

0.

),

(

'France'

,

39.0

,

43.4

,

0.

,

0.6

),

(

'Germany'

,

37.5

,

42.9

,

0.

,

-

0.4

),

(

'Belgium'

,

35.2

,

43.2

,

0.

,

0.

),

(

'Canada'

,

35.2

,

35.8

,

.

8

,

0.4

),

(

'Finland'

,

34.9

,

38.2

,

-

0.5

,

0.

),

(

'Italy'

,

30.4

,

35.7

,

0.3

,

-

0.3

),

(

'United States'

,

30.3

,

32.5

,

-

0.3

,

0.

),

(

'Greece'

,

26.8

,

30.6

,

0.4

,

0.

),

(

'Switzerland'

,

26.5

,

33.2

,

-

0.2

,

0.1

),

(

'Spain'

,

22.5

,

27.1

,

0.

,

0.3

),

(

'Japan'

,

20.7

,

26.6

,

0.

,

0.

),

]

text

(

"Current Receipts of Goverment as a Percentage of "

"Gross Domestic Product, 1970 and 1979"

,

20

,

70

,

width

=

215

)

text

(

"1970"

,

WIDTH

*.

28

,

HEIGHT

*

0.03

)

text

(

"1979"

,

WIDTH

*.

68

,

HEIGHT

*

0.03

)

def

ypos

(

val

):

# calculate a vertical position by scaling between 10% and 90%

# of the height of the image

return

HEIGHT

*

(

0.9

-

0.8

*

(

val

-

minval

)

/

(

maxval

-

minval

))

<

h1

>

find

the

minimum

and

maximum

values

in

the

range

h1

>

alldata

=

[

d

[

1

]

for

d

in

data

]

+

[

d

[

2

]

for

d

in

data

]

minval

,

maxval

=

min

(

alldata

),

max

(

alldata

)

for

label

,

start

,

end

,

startfudge

,

endfudge

in

data

:

align

(

RIGHT

)

text

(

label

,

0

,

ypos

(

start

+

startfudge

)

+

4

,

width

=

0.25

*

WIDTH

)

text

(

"

%0.1f

"

%

start

,

0.25

*

WIDTH

,

ypos

(

start

+

startfudge

)

+

4

,

width

=

0.07

*

WIDTH

)

align

(

LEFT

)

text

(

label

,

WIDTH

*.

75

,

ypos

(

end

+

endfudge

)

+

4

)

text

(

"

%0.1f

"

%

end

,

0.68

*

WIDTH

,

ypos

(

end

+

endfudge

)

+

4

,

width

=

0.07

*

WIDTH

)

line

(

WIDTH

*.

33

,

ypos

(

start

),

WIDTH

*.

67

,

ypos

(

end

))

Here’s what the result looks like.

Tufte Current Receipts Graphic with NodeBox

We have some great followups to this planned for next week. We’ll reimplement this code with the Python Imaging Library, which will open things up for Windows users. We have some great plans for mashing these graphics up with our just released Google Analytics API.

Check out our followup post that describes how we created a downloadable Windows application you can use to create these graphics.

Juiced Google Analytics Python API

Due to the release of an official Google Analytics Data Export API, this module is now deprecated. We have an alternative python module based upon the real analytics API here, and an exploring tool with an automatic code generation capability here.

It is not official. It is not from Google. It is, however, very functional and very here. I present to you pyGAPI, the Juiced Google Analytics Python API. This module allows you to pull information from your incarnation of Google Analytics and employ it programatically into your reporting code.

Let us use iPython to peek through some code using pyGAPI.

[sourcecode language="python" light="true"] In [3]: from datetime import date In [4]: import pyGAPI In [5]: connector = pyGAPI.pyGAPI(username, password, website_id="1234567")

[/sourcecode]

Here we create a pyGAPI object. Behind the scenes, pyGAPI logs into Google Analytics, and downloads an identifier cookie. website_id is optional. If omitted, pyGAPI accesses the first website on the account’s list. To get a list of all the site IDs to which your site has access, run the function connector.list_sites().

[sourcecode language="python" light="true"] In [6]: connector.download_report('KeywordsReport', (date(2008,3,10), date(2008,3,31)), limit=5)

[/sourcecode]

Download a report into your pyGAPI object. KeywordsReport is the name of the report. It is followed by a tuple containing the start and end dates in python date format. limit is an optional parameter that specifies the number of entries that pyGAPI should pull down. By default, it will pull in all the entries up to a maximum of 10000. Lowering this number will certainly improve performance. The entries returned are ranked by Visits, so you should get the most significant values of the bunch.

[sourcecode language="python" light="true"] In [7]: print connector.csv() Keyword,Visits,Pages/Visit,Avg. Time on Site,% New Visits,Bounce Rate,Visits,Subscribe,Solutions,Goal Conversion Rate,Per Visit Goal Value juice analytics,356,5.935393258426966,314.061797752809,0.38764044642448425,0.29494380950927734,356,1.0,0.16292135417461395,1.1629213094711304,0.0 excel training,142,1.971830985915493,98.0774647887324,0.908450722694397,0.6901408433914185,142,1.0,0.0211267601698637,1.0211267471313477,0.0 excel charts,77,1.7922077922077921,95.0,0.9090909361839294,0.7792207598686218,77,1.0,0.03896103799343109,1.0389610528945923,0.0 excel skills,72,1.6527777777777777,75.29166666666667,0.9444444179534912,0.7083333134651184,72,1.0,0.0,1.0,0.0 colbert bump,70,1.3142857142857143,113.77142857142857,0.6428571343421936,0.8428571224212646,70,1.0,0.0,1.0,0.0

[/sourcecode]

This function displays your report in a nice excel-ready CSV format.

[sourcecode language="python" light="true"] In [8]: print connector.parse_csv_as_dicts(convert_numbers=True) [{'Avg. Time on Site': 314.06179775280901, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.29494380950927734, 'Keyword': 'juice analytics', 'Visits': 356.0, 'Pages/Visit': 5.9353932584269664, 'Subscribe': 1.0, 'Solutions': 0.16292135417461395, '% New Visits': 0.38764044642448425, 'Goal Conversion Rate': 1.1629213094711304}, {'Avg. Time on Site': 98.077464788732399, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.69014084339141846, 'Keyword': 'excel training', 'Visits': 142.0, 'Pages/Visit': 1.971830985915493, 'Subscribe': 1.0, 'Solutions': 0.021126760169863701, '% New Visits': 0.90845072269439697, 'Goal Conversion Rate': 1.0211267471313477}, {'Avg. Time on Site': 95.0, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.77922075986862183, 'Keyword': 'excel charts', 'Visits': 77.0, 'Pages/Visit': 1.7922077922077921, 'Subscribe': 1.0, 'Solutions': 0.038961037993431091, '% New Visits': 0.90909093618392944, 'Goal Conversion Rate': 1.0389610528945923}, {'Avg. Time on Site': 75.291666666666671, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.70833331346511841, 'Keyword': 'excel skills', 'Visits': 72.0, 'Pages/Visit': 1.6527777777777777, 'Subscribe': 1.0, 'Solutions': 0.0, '% New Visits': 0.94444441795349121, 'Goal Conversion Rate': 1.0}, {'Avg. Time on Site': 113.77142857142857, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.84285712242126465, 'Keyword': 'colbert bump', 'Visits': 70.0, 'Pages/Visit': 1.3142857142857143, 'Subscribe': 1.0, 'Solutions': 0.0, '% New Visits': 0.6428571343421936, 'Goal Conversion Rate': 1.0}]

[/sourcecode]

This function goes the extra step and converts the CSV into a dictionary for easier programmatic use. By default, all entries will be returned as python strings. Setting convert_numbers to True, as we did here, will additionally parse the dictionary to turn all numbers into float values.

[sourcecode language="python" light="true"] In [9]: print connector.list_reports() ('ReferringSourcesReport', 'SearchEnginesReport', 'AllSourcesReport', 'KeywordsReport', 'CampaignsReport', 'AdVersionsReport', 'TopContentReport', 'ContentByTitleReport', 'ContentDrilldownReport', 'EntrancesReport', 'ExitsReport', 'GeoMapReport', 'LanguagesReport', 'HostnamesReport', 'SpeedsReport')

[/sourcecode]

This gets a list of all the reports that I have successfully tested thus far. All site-specific reports should work. A couple site-section specific reports should be included in the next update of pyGAPI.

Google is great and will release a real API soon, but until then you can download pyGAPI.

Setting DJANGO_SETTINGS_MODULE

Here’s a bash function I use for Django development to quickly set DJANGO_SETTINGS_MODULE.

function setdsm() { 
    # add the current directory and the parent directory to PYTHONPATH
    # sets DJANGO_SETTINGS_MODULE
    export PYTHONPATH=$PYTHONPATH:$PWD/..
    export PYTHONPATH=$PYTHONPATH:$PWD
    if [ -z "$1" ]; then 
        x=${PWD/\/[^\/]*\/}               
        export DJANGO_SETTINGS_MODULE=$x.settings
    else    
        export DJANGO_SETTINGS_MODULE=$1 
    fi

    echo "DJANGO_SETTINGS_MODULE set to $DJANGO_SETTINGS_MODULE"
}

I put this in my .bash_profile, then a quick setdsm sets the DJANGO_SETTINGS_MODULE to the settings.py in the current directory and add the current directory and it’s parent to PYTHONPATH.

Kaizen and Juice 2.0

Kaizen may be the the art of continuous improvement, but today we’re happy to showcase the art of discontinuous improvement. In one big bang, we’re introducing a new logo, a new website, and a new platform to deliver web services and tools to make your life better.

The new logo is the product of months of pixel pushing and brainstorming. I’ll detail the evolution of the logo in a future post, but for the moment I’ll leave you with a comparison of the old and new logos.

old Juice logo

new Juice logo

The website redesign is an effort to improve the “discoverability” of our site. Good articles were mouldering in the archives. It was hard to find old or popular articles. Search was barely existent. A follow up article will trace the evolution of the site design.

We built the new site using Python and Django. This is a dynamic platform that gives us a lot of power to add new features, tools, and applications. We’re excited about what we will be able to bring you—we have a whiteboard full of ideas just awaiting implementation.

The new site, while better, isn’t perfect. Despite our efforts, there may be links that don’t work or screencasts that neither screen nor cast. We’d love to hear your reaction to the new design. Please leave a comment to tell us what you think or if you find anything that’s broken. We’ll fix it right away. With your help, we’ll make this site and this community better in a process of continuous improvement—Kaizen.

We’ve gotten a lot of positive comments about the design. I wanted to thank rockbeatspaper, the web design consultants who worked with us to create this site. A great company and a terrific job.

Python Geocoding Help

Yahoo recently released a nifty geocoder API that’s free for small (<50,000 lookups per day), non-commercial applications. Rasmus Lerdorf (Yahoo’s PHP king) has written a nice introduction to using this geocoder in your PHP apps. In that spirit, here’s a cheap and cheerful Python class that we use to geocode addresses.

from xml.dom.minidom import parse 
import urllib  

class Geocoder:
    """ 
    look up an location using the Yahoo geocoding api
    Requires a Yahoo appid which can be obtained at:
    http://developer.yahoo.net/faq/index.html#appid
    Documentation for the Yahoo geocoding api can be found at:
    http://developer.yahoo.net/maps/rest/V1/geocode.html
    """      
def init(self, appid, address_str):
    self.addressstr = addressstr         
    self.addresses = []
    self.resultcount = 0         
    parms = {’appid’: appid, ’location’: addressstr}

    try:
        url = ’http://api.local.yahoo.com/MapsService/V1/geocode?’+urllib.urlencode(parms)
        # parse the xml contents of the url into a dom
        dom = parse(urllib.urlopen(url))
        results = dom.getElementsByTagName(’Result’)
        self.result_count = len(results)
        for result in results:
            d = {’precision’: result.getAttribute(’precision’),
                ’warning’: result.getAttribute(’warning’)}

        for itm in result.childNodes:
            # if precision is zip, Address childNode will not exist

        if itm.childNodes:
            d[itm.nodeName] = itm.childNodes[0].data                     
        else:
            d[itm.nodeName] = ’’                
        self.addresses.append(d)
    except:
        raise "GeocoderError"      

def repr(self):
    s = "Original address:n%snn"%self.addressstr         
    s += "%d match(s) found:nn"%self.resultcount         
    for addr in self.addresses:
        s += """Match precision: %(precision)s
            Location: (%(Latitude)s,%(Longitude)s)
            %(Address)s
            %(City)s, %(State)s %(Zip)s
        """ % addr         
    return s

if name == "__main__": sample_addresses = [’555 Grove St. Herndon,VA 20170’, ’1234 Greeley blvd, springfeld, va, 22152’, ’50009’] for addr in sample_addresses: g = Geocoder(’YahooDemo’, addr) print ’-’*80
print g

All you need to use this is a Yahoo application id.

You now have four different ways to geocode your company’s vital address. If you have suggestions or improvements, let us know. This code is public domain.

Restoring romance to the sports page

Why do our sports pages look like this?

Instead of this?

Eastern Conference

Atlantic

Nets

76ers

Celtics

Raptors

Knicks

Central

Pistons

Cavaliers

Bucks

Pacers

Bulls

Southeast

Heat

Wizards

Magic

Hawks

Bobcats

Western Conference

Pacific

Suns

Clippers

Lakers

Warriors

Kings

Southwest

Spurs

Mavericks

Grizzlies

Hornets

Rockets

Northwest

Nuggets

Timberwolves

Jazz

SuperSonics

Trail Blazers

Those green and red lines are "sparklines"--a term invented, I believe, by Edward Tufte. They are little, word-size graphics that show a trend more quickly and clearly than one could describe it. In this case, each sparkline shows an NBA’s team record throughout the season; a green up bar is a win, and a red down bar is a loss.

In less space than a standard standings listing, we see the sustained excellence of the Pistons, the steadiness of the Spurs and Mavericks, the Raptors recovering from their awful start, the wheels falling off the Pacers, the mystery that is the Nets. These large multiples of small graphics recover some of the romance and drama that is a season.

For a really beautiful example of sparklines applied to sports, look to Tufte’s professional example here. If you know Python, Grig Gheorghiu has written a simple tool for generating sparklines.