Programmatic Google Trends API
By Sal Uryasev
June 11, 2008
Find more about:
google
trends
api
programmatic
Updated October 21, 2009
Yesterday, Google released an update to their popular Google Trends tool. There are improvements over the previous version, but the biggest new feature is a new shiny button that lets you download all your data in the format of a CSV. This is a very cool enhancement. Where Google Trends was a geeky toy, it now takes the leap to integrate into analysts' reports and with that, edge its way onto managerial desks.
This python module is a quasi-API to make it easier to authenticate into Google Trends for those who want to squeeze the extra level of functionality out of their data. The advantage of programmatic access is that the data can be automatically trended and merged. It can be snuck into a 9:00 AM daily email to the VP of Marketing so that she knows to ramp up Google Adwords campaigns for some specific keyword. Also, by programatically pulling multiple reports, it is possible to create a wealth of data not visible in a single report. Using one keyword as a benchmark to merge multiple reports, we can do a meaningful comparison on tens or hundreds of relevant keywords.
To use the pyGTrends, the quasi-Google-Trends-API, you can download the latest version from github.
Here is an example of the most basic basic report that you can pull down from Google Trends. The connector function needs authentication info, and download_report needs to be passed a list of keywords.
from pyGTrends import pyGTrends
connector = pyGTrends('google username','google password')
connector.download_report(('keyword1', 'keyword2'))
print connector.csv()
You can, however, use pyGTrends to get any slice of data that you can pull down from Google Trends. To see the exact parameters that you should use, go to Google Trends, and navigate to the specific sufficiently-narrow report that you are interested in. Then, right-click on the CSV download, and save the link location. The different parameters should be discernible from the link. The following code downloads a report for banana, bread, and bakery keywords from April 2008, originating from the magnificent nation of Austria, and scaled using fixed scaling (aka the second download link).
connector.download_report(('banana', 'bread', 'bakery'),
date='2008-4',
geo='AT',
scale=1)
By default, the csv() function downloads the main part of the report, but there are a few additional parts stuck to the bottom of the CSV file. If you are interested in those, pass the section parameter to the csv() function. The following will return the Language section.
print connector.csv(section='Language')
Full recommended usage includes using either the csv.reader or csv.DictReader module.
from csv import DictReader
print DictReader(connector.csv().split('\n'))
Here is a snapshot from the new Google Trends to add some eye-candy to the post:

How Did We Mash Data into Google Analytics?
By Sal Uryasev
June 6, 2008
Find more about:
googleanalytics
google
This post is the code behind how we mashed external data into Google Analytics.
The first step is to yank reference data from the Google Analytics site to reference against Kampyle's data. We specifically want to gather individual names of websites (index.html, /index2.html), and the current selected daterange. The cell references to the website names in the table can be found using a neat Javascript Shell popular among Greasemonkey and Javascript developers. I will not go into detail about the Javascript Shell, but by checking out the various child nodes for the table object we can track down that document.getElementById('f_table_data').childNodes[3].rows[1].cells[1].textContent points at the text in the first cell of the first row. While the syntax looks long, it is just nested HTML in a more elegant programmatic fashion.
For the date, Google Analytics uses a slightly peculiar hybrid system where the date is drawn initially from the URL, but if the date is modified with the java date tool in the upper right hand corner, it uses that instead. From our end, document.getElementById('f_primaryBegin').value and document.getElementById('f_primaryEnd').value are the java date tool values that only start existing if the date tool is used. Pull these two values if they exist, and simply parse the date from the URL otherwise.
The clickable tab we created is essentially the equivalent of a little Greasemonkey button with a few frills that can be created in the standard Greasemonkey fashion. Wherever possible, I use Google-defined layouts for consistency with the site.
Next, we want to send out our reference data to some external server. Greasemonkey has good functionality for pulling data from other sites and servers through the use of the GM_xmlhttpRequest command. A server-end PHP or Django service might be easiest to implement. In this specific example, Kampyle wanted to use the SOAP protocol. While there is an excellent overall SOAP client for javascript by Matteo Casati, this client does not work in a plug and play fashion with Greasemonkey, and needed some modification. For any devoted SOAPers who want to try Greasemonkey, the revised javascript-soap-client code can be found in the attached file. We use the SHA256 encryption function written by Angel Marin and Paul Johnston, but that is accomplished by just copying and pasting the function into our code.
The result comes back in the form of an xml object describing each row in the table, which we parse using native Javascript/Greasemonkey methods, and pop back into the table in the way that we extracted the individual website names. A neat trick here is to call each individual row individually, and not to wait for the data to come back before calling the next row from the server. Separate listeners can wait and insert the data at their leisure. This allows our page to load up faster, and in case there is an error with one data element, it could potentially allow the rest of the rows to load in peace.
You can play around with my code here. This code is released under the BSD License. You won't be able to run the code verbatim without Kampyle's compliance, since they have changed the API calls on their server. However, much of it should be very portable to other data sources.
2 comments
Gotham said:
This post have great information but I needed a few clarifications. For one of my clients I have usual web analytics info displayed in GA. Additionally the client has call tracking data in its own database. Can I pull that info into GA in a new tab? Your mashup indicates that you added a new tab called "Kampyle", are the names of the table which shows up configurable? (e.g URL, avg grade)
Sal Uryasev said:
Hey Gotham,
Yes - as long as you have easy access to the data, you can push any data that you want into Google Analytics. If the data is completely static, you can even add it to the script. Alternatively, you could have a hosted file somewhere. In our case, the data was very dynamic, so we used a server with another web service to fetch the data.
If you click on the picture above, it'll show you the entire table, including column names that I changed around. Essentially, you have the power to change any text that you can select by a mouse. It is just a matter of knowing where to point your code.






39 comments | Show all comments only the last 5 are shown
yadab das said:
The PyGTrends.py API looks really fascinating to me. I have almost converted the Code to Java and will publish this week with a Swing Interface. Any better suggestions on that?
Archie said:
Hi Yadab! Could you please share the Java code you have written? I am also working on it. Please contact me by email.
aavaliani (at) gmx.net
Gautham Ramachandran said:
The PyGTrends.py API is really awesome. I have a question though. Does Google frown upon iterative pings to Google Trends to pull Relative traffic. To make it more specific, if I have 2,000 keywords and I code iterative pulls from Google trends, do I stand a chance of getting banned?
Gautham.
Sal Uryasev said:
Yes and No.
I believe that Google tends to be generous regarding use of their services. They want people to get the maximum utility out of the products, but they don't want their generosity to be abused. The login requirement for downloading the Google Trends data is probably there just for that reason. The cap is probably quite large, but there certainly is one. I wouldn't build a webservice (without having users use their own account). You may have more luck if you lump many keywords per call, and spread out your data gathering over longer periods of time.
James Solo said:
This Python script is great and provides an excellent solution. However, I have never used Python before so I was hoping someone could email (james.solo |AT| mathworks.com) me step by step instructions on how to modify this script to work with keywords of my choice ( I have 40 total ) and to grab data from 2004 to-date using the "CSV with relative scaling" data file.
Many thanks,
James
Arjun said:
I have been using the pyGTrends module and have encountered problems when using keywords with more than one word. For instance, "air express" was one of the keywords. It has a search history--when I manually download the data from Google Trends, the historical data shows up fine. However, when I use the pyGTrends module, the data comes out as all zeroes.
The same problem occurs with all keywords/phrases that contain more than one word. Is pyGTrends compatible with only one-word keywords, and if not, how do I fix this problem?
If someone could email me, arjunrmodi at gmail dot com, that would be great. Thanks.
A said:
Great module.
Sal Uryasev said:
Arjun: It should work now. Thanks for pointing it out!
CurtD said:
Hi,
I am wondering if you Google Gurus know of how to attack a project I am trying to accomplish. I want to be able to run a script, query, software that can give me the answer to something like the following.."I want to know what search terms have been searched 1500 a day on average and have less than 200,000 results". The 1,500 and 200,000 are variables that can be changed. It would be cool to be able to have a drop down menu to say "last 30 days" "last 90 days" etc. I have posted this project on Elance and Guru and I have mostly gotten "never heard of it, can't be done." Some talked of building a library through API and then running queries on it. Does this query exist or how can I go about getting it done?
Sal Uryasev said:
Hey Curt,
The problem lies with the fact that Google Trends does not have any absolute numbers. You can only get results as they are relative to other results.
It appears to be an intentional omission. The only way to get at that data could be to get hired as a Google employee.
Clemens said:
Hi Sal,
This is really a great tool. I am facing the same issue as Arjun. I am not able to extract keyword combinations for phrases with e.g. two words. Single keywords work fine...
You mentioned, that this issue has been resolved. I downloaded the latest version from github, can you give me some advise how to extract keyword combinations, or how to adjust the code respectively.
Thanks in advance, this is really a great work.
Johan said:
Hi,
The code has (obviously) changed with later versions which makes the instructions above harder to follow. Where to enter the keywords, in this line?
def download_report(self, keywords, date='all', geo='all', geor='all', graph = 'all_csv', sort=0, scale=0, sa='N'):
I have never used Python before...
atleta said:
The lib does not work correctly at the moment. It messes up the headers, the header for the first column (e.g. 'Week') is not inserted by the code as it generates the headers from the first line, while the all tables ('sections') have one more column whose name you actually use to identify the section. Thus one solution is to insert the requested section name as the name of the first column.
Sal Uryasev said:
Thanks for the bugfix atleta. I implemented your idea.
Sal Uryasev said:
Clemens,
You can just enter two-word strings wherever you would enter one-word strings: 'google analytics' instead of 'banana'
Johan,
The 'self' is a pythonic reference to the object itself.
There is a bit more of a description here:
http://www.ibiblio.org/g2swap/byteofpython/read/self.html
Gautamm said:
I really loved this and it works great.
I have one question though, whenever I put in a single query it splits it into alphabets. How does one circumvent this?
Sal Uryasev said:
Hey Guatamm,
The first parameter to download_report needed to be a list or tuple, so something like ('bread') would have had to be ('bread',) or ['bread']. Python can be slightly annoying, where it can treat parentheses as simply parentheses, unless there is a comma to make it a tuple. I added in a small fix so that is no longer necessary, if you want to re-pull the code from git.
redneckjedi said:
Sal,
This is great, and it was working great for me. However, after a few calls in an iterative script, I started getting a 503 Service Unavailable message. I'm guessing they figured out I was running a script. Any idea how to get around this?
Sal Uryasev said:
Hey Redneckjedi,
I like your name.
My best guess is to try drastically slowing down your iterative script with sleep(x). Making multiple repeat calls is probably not built into Google's design, so my guess is that you're hitting some sort of rate limit. Google tends to be generous with this sort of stuff, so by spacing out the calls, it may help quite a bit.
redneckjedi said:
Thanks Sal. That's what I've been trying, and it works now that I'm on a different computer with a different IP. Any idea on how long I'll have to wait before I can run it on my original machine?
Sal Uryasev said:
I have no idea... sorry. If you find out though, this might be a good piece of information to stub into the post above.
atleta said:
redneckjedi: I've also run into this. No idea for how long they will ban your IP but you could use a proxy in the meanwhile.
Sal Uryasev said:
It looks like it is just a daily limit. There is a forum discussion about it here: http://www.thirtydaychallenge.com/forums/general-chat/5364-google-trends-limit-wtf.html
pstobbs said:
Hi Sal
Great tool- very useful indeed, many thanks.
I'm new to Python but pulled together a little code to cycle through different keyword inputs. It seemed to be working fine but then I ran into the issue mentioned by redneckjedi unfortunaltely. I had included a sleep(10) but clearly that wasn't enough and I am now getting Http Error 503: Service Unavailable. When I access the site manually I am greeted with a captcha so I can still run the queries manually just not through the script- has anyone else come across this? I hope the block will stop soon and I will then add a much longer sleep. I only have about 20 or so keyword sets which should be ok I think?
Anyone any ideas how long the ban lasts?
cheers
Piers
Below is my rudimentary loop which I'm sure could be improved upon
#----- import pygtrends
import sys
sys.path.append('c:\\python_code')
from pyGTrends import pyGTrends
connector = pyGTrends('username','password')
#----- initiate output file
import time
final_output = str()
#-----load csv inputs file
import csv
gtrends_input = csv.reader(open("google_trends_inputs.csv","rb"))
gtrends_input_list = []
gtrends_input_list.extend(gtrends_input)
#---- define loop
loop_range = range(1,len(gtrends_input_list))
for x in loop_range:
#---- parameters
tmp_row = gtrends_input_list[x]
tmp_terms = tmp_row[:5]
tmp_geo = tmp_row[5]
#---- save parameters to output
final_output = final_output + tmp_geo + '\n'
#---- get data
connector.download_report((tmp_terms),geo=tmp_geo,scale=1)
#---- append output
tmp_output = connector.csv() + '\n'
final_output = final_output + tmp_output
time.sleep(100)
print 'done',x, tmp_terms, tmp_geo, tmp_symbol
#---- end loop
#---- output
outfile = open('output.csv','w')
outfile.write(final_output)
outfile.close()
Dheeru said:
My IP was blocked for a day after submitting less than 100 queries.
Bas said:
resp = self.opener.open(self.url_ServiceLoginBoxAuth,params).read();
print " resp ::"+resp;
Im getting the response from google that says "Your browser's cookie functionality is turned off. Please turn it on". ie, not able to login to google account for trends.
Please help me out.
Sal Uryasev said:
Alright.
It turns out that Google changed their Login algorithm. Where in the past I could construct a unique 11-digit ID out of thin air, the new algorithm matched the 11-digit ID against one written into a cookie.
The reason why jcp20 were failing some of the time is probably due to Google rolling out their new login algorithm, and some of the calls would hit an older server, while others would fail on a new server. The data pulls should work consistently at this point.
I have also simplified the convenience functions, with the expectation that the user uses the csv.reader or the csv.DictReader features of the standard library.
Recommended example:
from csv import DictReader
r = pyGTrends(username, password)
r.download_report(('pants', 'skirt'))
d = DictReader(r.csv().split('\n'))
Josh said:
This is a great idea--I've been waiting for google to roll out their long promised API...
Has anybody but Dheeru experienced the limitations of this API. I would like to set up a datasource to integrate cleanly into a cakePHP app, but it doesn't seem worth the while if google cuts off access to an account/ip(?) after a small number of queries.
Anybody have any specifics on this?
SivaThumma said:
No doubt, This code is an excellent start.,
But As-it-is I tried your code, I am getting
Exception("Could not find requested section") [line number:101]
Can anyone reply on what should I do exactly ?
I would Thank anyone who takes kindness to mail me at sivatumma@hotmail.com.
Saravanan said:
Hi Sal,
Thank you so much for the wonderful code - really looks promising.
A quick question..I had used the following script
from pyGTrends import pyGTrends
from csv import DictReader
r = pyGTrends('my email', 'passwd')
r.download_report(('spain', 'wine'))
print DictReader(r.csv().split('\n'))
As a result of it, one "pyGTrends.pyc" got created and output was printing as "<csv.DictReader instance at 0x8b8998>". I didn't see any error display btw..
How do i get the CSV output of the report? Could you please assist with me on this?
Thanks
David said:
I have the same questions as the above.
also, is there a document for using this API.
Thanks
Aloysius Adrian said:
I want to ask about section in csv function. I was trying to get a certain csv, but the interpreter stops at line 115 that produces error message : "KeyError: 'main'"
I want to ask about the other parameter. I can not print the csv because of the section parameter there.
Thanks.
Sal Uryasev said:
The DictReader module is a Python convenience module for reading data into a dictionary. Its use is optional.
One thing that can be done if there are any issues is to print connector.raw_data, and that would display the direct result from Google Analytics. Google sometimes displays additional username/login related problems that the module may not have accounted for.
David said:
From yesterday, This script can't do print connect.csv(). It always returns "Could not find requested section" error. Do you know where the problem is?
I also download the original script in case I modify anything but the error remains.
Thanks
Aloysius Adrian said:
@David
I, also, can not get the connect.csv()
I contacted Mr. Uryasev, and he suggested that I try print connect.raw_data
I did that, but the result/output was : "You must be signed in to export data from Google Trends"
I wonder if you also get that kind of output..
Eric Wilson said:
I have the same problem as David and Aloysius. If anyone has overcome this, please let me know.
D'Artagnan said:
How do you select multiple years? e.g. 2008 2009 I've tried 2008-2009 and 1/2008 12m but no luck...
David Drace said:
Thanks so much for this. I've just launched a site that uses this API to build a U.S. map tracking "flea" search activity, indicating flea prevalence. Have a look: http://banfieldfleafighter.com.
To do this, I set up a cron to poll Google Trends once every morning and write a small CSV file. This file gets parsed by PHP and fed to Flash with every visit to the site.
Aside from the very occasional "Cannot parse GALX out of login page" error, it works like a charm.
Sherin seo said:
Great post..It will be really helpful in my reporting structuring !!!Can u just update something about Machros???I am little confused about it!!
For sherin--> www.copperbridgemedia.com ..
said:
Add a comment