Close

Browse by...

Dates click to expand

2008
3
1
2

Programmatic Google Trends API

Yesterday, Google released an update to their popular Google Trends tool. There are improvements over the previous version, but the biggest new feature is a new shiny button that lets you download all your data in the format of a CSV. This is a very cool enhancement. Where Google Trends was a geeky toy, it now takes the leap to integrate into analysts' reports and with that, edge its way onto managerial desks.

This python module is a quasi-API to make it easier to authenticate into Google Trends for those who want to squeeze the extra level of functionality out of their data. The advantage of programmatic access is that the data can be automatically trended and merged. It can be snuck into a 9:00 AM daily email to the VP of Marketing so that she knows to ramp up Google Adwords campaigns for some specific keyword. Also, by programatically pulling multiple reports, it is possible to create a wealth of data not visible in a single report. Using one keyword as a benchmark to merge multiple reports, we can do a meaningful comparison on tens or hundreds of relevant keywords.

To use the pyGTrends, the quasi-Google-Trends-API, download the file from our server.

Here is an example of the most basic basic report that you can pull down from Google Trends. The connector function needs authentication info, and download_report needs to be passed a list of keywords.

from pyGTrends import pyGTrends

connector = pyGTrends('google username','google password')
connector.download_report(('keyword1', 'keyword2'))
print connector.csv()

You can, however, use pyGTrends to get any slice of data that you can pull down from Google Trends. To see the exact parameters that you should use, go to Google Trends, and navigate to the specific sufficiently-narrow report that you are interested in. Then, right-click on the CSV download, and save the link location. The different parameters should be discernible from the link. The following code downloads a report for banana, bread, and bakery keywords from April 2008, originating from the magnificent nation of Austria, and scaled using fixed scaling (aka the second download link).

connector.download_report(('banana', 'bread', 'bakery'), 
                          date='2008-4', 
                          geo='AT', 
                          scale=1)

By default, the csv() function downloads the main part of the report, but there are a few additional parts stuck to the bottom of the CSV file. If you are interested in those, pass the section parameter to the csv() function. If you do not want column headers on your data, you can also pass the column_headers parameter as false. The following will return the Language section without any headers.

print connector.csv(section='Language', column_headers=False)

Here is a snapshot from the new Google Trends to add some eye-candy to the post: Google Trends Eye-Candy

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

8 comments | Show all comments only the last 5 are shown


June 24, 2008
Sal Uryasev said:

Yes and No.
I believe that Google tends to be generous regarding use of their services. They want people to get the maximum utility out of the products, but they don't want their generosity to be abused. The login requirement for downloading the Google Trends data is probably there just for that reason. The cap is probably quite large, but there certainly is one. I wouldn't build a webservice (without having users use their own account). You may have more luck if you lump many keywords per call, and spread out your data gathering over longer periods of time.


July 21, 2008
James Solo said:

This Python script is great and provides an excellent solution. However, I have never used Python before so I was hoping someone could email (james.solo |AT| mathworks.com) me step by step instructions on how to modify this script to work with keywords of my choice ( I have 40 total ) and to grab data from 2004 to-date using the "CSV with relative scaling" data file.

Many thanks,
James


July 21, 2008
Arjun said:

I have been using the pyGTrends module and have encountered problems when using keywords with more than one word. For instance, "air express" was one of the keywords. It has a search history--when I manually download the data from Google Trends, the historical data shows up fine. However, when I use the pyGTrends module, the data comes out as all zeroes.

The same problem occurs with all keywords/phrases that contain more than one word. Is pyGTrends compatible with only one-word keywords, and if not, how do I fix this problem?

If someone could email me, arjunrmodi at gmail dot com, that would be great. Thanks.


July 21, 2008
A said:

Great module.


July 21, 2008
Sal Uryasev said:

Arjun: It should work now. Thanks for pointing it out!

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment





Juiced Google Analytics Python API

It is not official. It is not from Google. It is, however, very functional and very here. I present to you pyGAPI, the Juiced Google Analytics Python API. This module allows you to pull information from your incarnation of Google Analytics and employ it programatically into your reporting code.

Let us use iPython to peek through some code using pyGAPI.

In [3]: from datetime import date
In [4]: import pyGAPI
In [5]: connector = pyGAPI.pyGAPI(username, password, website_id="1234567")

Here we create a pyGAPI object. Behind the scenes, pyGAPI logs into Google Analytics, and downloads an identifier cookie. website_id is optional. If omitted, pyGAPI accesses the first website on the account's list. To get a list of all the site IDs to which your site has access, run the function connector.list_sites().

In [6]: connector.download_report('KeywordsReport', (date(2008,3,10), date(2008,3,31)), limit=5)

Download a report into your pyGAPI object. KeywordsReport is the name of the report. It is followed by a tuple containing the start and end dates in python date format. limit is an optional parameter that specifies the number of entries that pyGAPI should pull down. By default, it will pull in all the entries up to a maximum of 10000. Lowering this number will certainly improve performance. The entries returned are ranked by Visits, so you should get the most significant values of the bunch.

In [7]: print connector.csv()
Keyword,Visits,Pages/Visit,Avg. Time on Site,% New Visits,Bounce Rate,Visits,Subscribe,Solutions,Goal Conversion Rate,Per Visit Goal Value
juice analytics,356,5.935393258426966,314.061797752809,0.38764044642448425,0.29494380950927734,356,1.0,0.16292135417461395,1.1629213094711304,0.0
excel training,142,1.971830985915493,98.0774647887324,0.908450722694397,0.6901408433914185,142,1.0,0.0211267601698637,1.0211267471313477,0.0
excel charts,77,1.7922077922077921,95.0,0.9090909361839294,0.7792207598686218,77,1.0,0.03896103799343109,1.0389610528945923,0.0
excel skills,72,1.6527777777777777,75.29166666666667,0.9444444179534912,0.7083333134651184,72,1.0,0.0,1.0,0.0
colbert bump,70,1.3142857142857143,113.77142857142857,0.6428571343421936,0.8428571224212646,70,1.0,0.0,1.0,0.0

This function displays your report in a nice excel-ready CSV format.

In [8]: print connector.parse_csv_as_dicts(convert_numbers=True)
[{'Avg. Time on Site': 314.06179775280901, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.29494380950927734, 'Keyword': 'juice analytics', 'Visits': 356.0, 'Pages/Visit': 5.9353932584269664, 'Subscribe': 1.0, 'Solutions': 0.16292135417461395, '% New Visits': 0.38764044642448425, 'Goal Conversion Rate': 1.1629213094711304}, {'Avg. Time on Site': 98.077464788732399, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.69014084339141846, 'Keyword': 'excel training', 'Visits': 142.0, 'Pages/Visit': 1.971830985915493, 'Subscribe': 1.0, 'Solutions': 0.021126760169863701, '% New Visits': 0.90845072269439697, 'Goal Conversion Rate': 1.0211267471313477}, {'Avg. Time on Site': 95.0, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.77922075986862183, 'Keyword': 'excel charts', 'Visits': 77.0, 'Pages/Visit': 1.7922077922077921, 'Subscribe': 1.0, 'Solutions': 0.038961037993431091, '% New Visits': 0.90909093618392944, 'Goal Conversion Rate': 1.0389610528945923}, {'Avg. Time on Site': 75.291666666666671, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.70833331346511841, 'Keyword': 'excel skills', 'Visits': 72.0, 'Pages/Visit': 1.6527777777777777, 'Subscribe': 1.0, 'Solutions': 0.0, '% New Visits': 0.94444441795349121, 'Goal Conversion Rate': 1.0}, {'Avg. Time on Site': 113.77142857142857, 'Per Visit Goal Value': 0.0, 'Bounce Rate': 0.84285712242126465, 'Keyword': 'colbert bump', 'Visits': 70.0, 'Pages/Visit': 1.3142857142857143, 'Subscribe': 1.0, 'Solutions': 0.0, '% New Visits': 0.6428571343421936, 'Goal Conversion Rate': 1.0}]

This function goes the extra step and converts the CSV into a dictionary for easier programmatic use. By default, all entries will be returned as python strings. Setting convert_numbers to True, as we did here, will additionally parse the dictionary to turn all numbers into float values.

In [9]: print connector.list_reports()
('ReferringSourcesReport', 'SearchEnginesReport', 'AllSourcesReport', 'KeywordsReport', 'CampaignsReport', 'AdVersionsReport', 'TopContentReport', 'ContentByTitleReport', 'ContentDrilldownReport', 'EntrancesReport', 'ExitsReport', 'GeoMapReport', 'LanguagesReport', 'HostnamesReport', 'SpeedsReport')

This gets a list of all the reports that I have successfully tested thus far. All site-specific reports should work. A couple site-section specific reports should be included in the next update of pyGAPI.

Google is great and will release a real API soon, but until then you can download pyGAPI.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

11 comments | Show all comments only the last 5 are shown


June 21, 2008
Matt Webb said:

This is awesome work. Do you think this python script could work in conjunction with superkaramba on Linux?


June 27, 2008
Rodrigo said:

This is great. I put this together with a Samurize desktop to display Analytics data on my desktop.
Thanks!


August 7, 2008
Ludovic said:

Very nice work. Very useful to, let's say get your most visited pages without having to maintain parallel accounting. May I ask you to licence it to an OSS licence and put it on Google Code ? Would be great.


August 20, 2008
Sebastian said:

Hello,

it work well! Great.
How can i pull the "keyword" or "country" report for a specific URL?
(use segmention)

Thanks


September 5, 2008
Thierry said:

Awesome work !

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment