Programmatic Google Trends API

Yesterday, Google released an update to their popular Google Trends tool. There are improvements over the previous version, but the biggest new feature is a new shiny button that lets you download all your data in the format of a CSV. This is a very cool enhancement. Where Google Trends was a geeky toy, it now takes the leap to integrate into analysts' reports and with that, edge its way onto managerial desks.

This python module is a quasi-API to make it easier to authenticate into Google Trends for those who want to squeeze the extra level of functionality out of their data. The advantage of programmatic access is that the data can be automatically trended and merged. It can be snuck into a 9:00 AM daily email to the VP of Marketing so that she knows to ramp up Google Adwords campaigns for some specific keyword. Also, by programatically pulling multiple reports, it is possible to create a wealth of data not visible in a single report. Using one keyword as a benchmark to merge multiple reports, we can do a meaningful comparison on tens or hundreds of relevant keywords.

To use the pyGTrends, the quasi-Google-Trends-API, you can download the latest version from github.

Here is an example of the most basic basic report that you can pull down from Google Trends. The connector function needs authentication info, and download_report needs to be passed a list of keywords.

from pyGTrends import pyGTrends

connector = pyGTrends('google username','google password')
connector.download_report(('keyword1', 'keyword2'))
print connector.csv()

You can, however, use pyGTrends to get any slice of data that you can pull down from Google Trends. To see the exact parameters that you should use, go to Google Trends, and navigate to the specific sufficiently-narrow report that you are interested in. Then, right-click on the CSV download, and save the link location. The different parameters should be discernible from the link. The following code downloads a report for banana, bread, and bakery keywords from April 2008, originating from the magnificent nation of Austria, and scaled using fixed scaling (aka the second download link).

connector.download_report(('banana', 'bread', 'bakery'), 
                          date='2008-4', 
                          geo='AT', 
                          scale=1)

By default, the csv() function downloads the main part of the report, but there are a few additional parts stuck to the bottom of the CSV file. If you are interested in those, pass the section parameter to the csv() function. If you do not want column headers on your data, you can also pass the column_headers parameter as false. The following will return the Language section without any headers.

print connector.csv(section='Language', column_headers=False)

Here is a snapshot from the new Google Trends to add some eye-candy to the post: Google Trends Eye-Candy

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. All source code is released under a BSD License unless otherwise specified.

24 comments | Show all comments only the last 5 are shown


May 8, 2009
redneckjedi said:

Thanks Sal. That's what I've been trying, and it works now that I'm on a different computer with a different IP. Any idea on how long I'll have to wait before I can run it on my original machine?


May 8, 2009
Sal Uryasev said:

I have no idea... sorry. If you find out though, this might be a good piece of information to stub into the post above.


May 17, 2009
atleta said:

redneckjedi: I've also run into this. No idea for how long they will ban your IP but you could use a proxy in the meanwhile.


May 18, 2009
Sal Uryasev said:

It looks like it is just a daily limit. There is a forum discussion about it here: http://www.thirtydaychallenge.com/forums/general-chat/5364-google-trends-limit-wtf.html


June 26, 2009
pstobbs said:

Hi Sal
Great tool- very useful indeed, many thanks.

I'm new to Python but pulled together a little code to cycle through different keyword inputs. It seemed to be working fine but then I ran into the issue mentioned by redneckjedi unfortunaltely. I had included a sleep(10) but clearly that wasn't enough and I am now getting Http Error 503: Service Unavailable. When I access the site manually I am greeted with a captcha so I can still run the queries manually just not through the script- has anyone else come across this? I hope the block will stop soon and I will then add a much longer sleep. I only have about 20 or so keyword sets which should be ok I think?

Anyone any ideas how long the ban lasts?
cheers
Piers

Below is my rudimentary loop which I'm sure could be improved upon

#----- import pygtrends
import sys
sys.path.append('c:\\python_code')
from pyGTrends import pyGTrends
connector = pyGTrends('username','password')

#----- initiate output file
import time
final_output = str()

#-----load csv inputs file
import csv
gtrends_input = csv.reader(open("google_trends_inputs.csv","rb"))
gtrends_input_list = []
gtrends_input_list.extend(gtrends_input)

#---- define loop
loop_range = range(1,len(gtrends_input_list))

for x in loop_range:

#---- parameters
tmp_row = gtrends_input_list[x]
tmp_terms = tmp_row[:5]
tmp_geo = tmp_row[5]

#---- save parameters to output
final_output = final_output + tmp_geo + '\n'

#---- get data
connector.download_report((tmp_terms),geo=tmp_geo,scale=1)

#---- append output
tmp_output = connector.csv() + '\n'
final_output = final_output + tmp_output
time.sleep(100)
print 'done',x, tmp_terms, tmp_geo, tmp_symbol

#---- end loop

#---- output

outfile = open('output.csv','w')
outfile.write(final_output)
outfile.close()

Your name

Email (optional, will not be shared)

Type the word "juice" (required to confuse the spammers)

Your comment


Add a comment