Python Geocoding Help
By Chris Gemignani
February 28, 2006
Find more about:
googleearth
juice
python
Yahoo recently released a nifty geocoder API that's free for small (<50,000 lookups per day), non-commercial applications. Rasmus Lerdorf (Yahoo's PHP king) has written a nice introduction to using this geocoder in your PHP apps. In that spirit, here's a cheap and cheerful Python class that we use to geocode addresses.
from xml.dom.minidom import parse
import urllib
class Geocoder:
"""
look up an location using the Yahoo geocoding api
Requires a Yahoo appid which can be obtained at:
http://developer.yahoo.net/faq/index.html#appid
Documentation for the Yahoo geocoding api can be found at:
http://developer.yahoo.net/maps/rest/V1/geocode.html
"""
def init(self, appid, address_str):
self.addressstr = addressstr
self.addresses = []
self.resultcount = 0
parms = {'appid': appid, 'location': addressstr}
try:
url = 'http://api.local.yahoo.com/MapsService/V1/geocode?'+urllib.urlencode(parms)
# parse the xml contents of the url into a dom
dom = parse(urllib.urlopen(url))
results = dom.getElementsByTagName('Result')
self.result_count = len(results)
for result in results:
d = {'precision': result.getAttribute('precision'),
'warning': result.getAttribute('warning')}
for itm in result.childNodes:
# if precision is zip, Address childNode will not exist
if itm.childNodes:
d[itm.nodeName] = itm.childNodes[0].data
else:
d[itm.nodeName] = ''
self.addresses.append(d)
except:
raise "GeocoderError"
def repr(self):
s = "Original address:n%snn"%self.addressstr
s += "%d match(s) found:nn"%self.resultcount
for addr in self.addresses:
s += """Match precision: %(precision)s
Location: (%(Latitude)s,%(Longitude)s)
%(Address)s
%(City)s, %(State)s %(Zip)s
""" % addr
return s
if name == "__main__":
sample_addresses = ['555 Grove St. Herndon,VA 20170', '1234 Greeley blvd, springfeld, va, 22152', '50009']
for addr in sample_addresses:
g = Geocoder('YahooDemo', addr)
print '-'*80
print g
All you need to use this is a Yahoo application id.
You now have four different ways to geocode your company's vital address. If you have suggestions or improvements, let us know. This code is public domain.
Go with the flow in data display
By Zach Gemignani
February 27, 2006
Find more about:
visualization
We spent the last couple of days working with a client on displaying data for real-time dashboards. It got me to thinking: Are there an implicit assumptions and mental habits that people bring to data interpretation? And if so—are there some basic practices to consider for visualizing data?
Which isn't to say this is a right and perfect way to display any particular data; there is room both for creativity and structure. (Check out Information Aesthetics for examples of creative data visualization.) But in the world of management communication, it can't hurt to be aware of your audiences' ingrained assumptions. You want the smoothest path to your important points. The risk is in missing your tiny window to focus a frazzed executive's mind on your point--and finding your carefully constructed analysis get sidetracked.
Here's a starter list of these embedded assumptions:
1. Axes are often the last thing people look at in a chart.
They expect time to progress from right to left and linear scales that start at zero. If two charts are adjacent, they will probably assume the axes and scales are the same. When it comes to the famous two-by-two consulting matrix, good things happen in the upper-right; bad things are in the lower-left. That said, I'm mystified that the famous BCG growth/share matrix's insists on rejecting my new rule.
2. Fluff. Dressing up your display implies you aren't comfortable with the data's ability to stand on its own or you don't have much to say. This can include clip art, data incorporated into pictures, and animation. USA Today is particularly good at this. Check out a couple of examples from their Snapshots section. They have less than three numbers to communicate, but fill it up with eye-catching graphics.


3. Point of focus. Most data displays have a clear point of focus for the viewer, whether the presenter intends it or not. It could be the peak in a line chart, values crossing over zero, or a sudden change in values. In a chart like this (below), your intention may be to highlight the general growth trend -- but you can't avoid the inevitable questions about the drop after 2000. You can short-circuit these off-the-topic questions with an explanatory footnote or annotation. Ask yourself: what is the main point I want the reader to get, and what else will my data presentation imply?

4. Proximity and size. Placing information close together suggests a connection. Sometimes accidental proximity can cause confusion. You might present two unrelated phenomena next to each other and the audience will automatically try to draw a connection (e.g. dogs have big teeth; teeth are good for crunchiing carrots. Audience thinks: dogs must like to crunch carrots). I just ran across Live Plasma, a great site that lets you enter a musical artist (or band, movie, director, or actor) then shows you related artists. The designers of this data visualization do a great job of building on our data display expectations by using size and proximity to show related artists.

3 comments
Robbin Steif said:
Live Plasma looks cool but is not intuitive enough. You point out that the axes are the last-looked-at, but here I found myself desperately searching for a legend to understand if size or proximity or color matters.
Robbin Steif
<a href="http://www.lunametrics.com/?source=blog&segment=other" rel="nofollow">LunaMetrics</a>
Zach said:
Good point, Robbin. It is hard to find the meanings for the size, color, and proximity on the site.
Mary said:
Speaking of mental habits, as you were at the beginning of this article, I am wondering if you have spent any time reading about Art Costa's Habits of Mind ideas. It is, of course, education, not business, oriented.
Add a comment
(Re) Introducing Absolutely Google Earth
By Juice Alumni
February 27, 2006
Find more about:
googleearth
juice
google
A while back we released a collection of tools and resources for Google Earth. We've restructured the page a bit and added a few new links. Check out the new version and make sure to let us know if you have anything to add.
Scaring Your Users
By Juice Alumni
February 24, 2006
Find more about:
design
interface
With the release of screenshots of Microsoft Office 12, I started thinking about how rapid of a change it was from their current interface design. Aren't they worried about scaring away the users that are so comfortable in the new design? Probably not:
- Microsoft Office has no real competitors and isn't too worried (yet) about losing users.
- Microsoft has to justify to their users that spending a few hundred dollars on an upgrade is worth it. The best way to do that, is to make it look very different
- New Interface = Users that need new training = $$$ for Microsoft
So Microsoft isn't in trouble then. But not all products have that luxury. Being a recent graduate, I was not immune from the poker virus that hit college campuses. Every once in a while I play online at PartyPoker. The other day I logged in, approved the mandatory software upgrade, and fired it up ready to play. When I opened it up, I almost gasped.
Old Interface


- Poker sites have a lot of competition and high turnover
- Most poker sites pretty much have the same features, games, and functionality.
- Users that need more training = Users that switch to another site
Every week, poker sites have promotional bonuses to try and drive people from one site to another. The only thing keeping users from switching is that they are comfortable with a sites, look and feel. If you take that away from them, you're making it easier for them to switch to a site with a nice promotion. PartyPoker would have been a lot better off gradually adding in their new features and making sure that their users absorbed the changes as they came.
ESPN is a great example of user interface understanding. They constantly are adding new features (like streaming video) and changing the look and feel of their site, but in a controlled, conservative way.
Illustrating Imprecision with Excel
By Chris Gemignani
February 21, 2006
Find more about:
analytics
excel
screencast
A few days ago Zach made a nice point about Zillow. It's oh-so-easy to produce numbers that are precise but are not accurate. Here's a quick screencast to show you one fun way to draw the distinction in Excel using number formatting.
Click picture to view video.
Note: In the screencast, I say precision when I mean to say accuracy no fewer than *four* times. Sorry.
Video of Excel 12 Business Intelligence Inaction
By Chris Gemignani
February 21, 2006
Find more about:
excel
screencast
Here's video of the new analytics capabilities coming in Excel 12, including the revisions to PivotTables. Microsoft is pushing hard to weave Excel, SQL Server, and Sharepoint into an integrated system.
It's early, but I'm concerned that analysts will have to know even more to get useful work done. Analysts would benefit from PivotTables that are easier to use rather than PivotTables that require knowledge of SQL Server, SharePoint, Unified Data Models, etc.
If you're an analyst, check out the video and let us know if the new Excel approach would work in your organization. The video is 50 minutes long. Jump to 9 minutes in if you want to get past the intro chitter-chatter.
What is analytics?
By Zach Gemignani
February 20, 2006
Find more about:
analytics
bi
A reader wrote to us today:
I seem to have spent the last few days (not including the week-end I must add) trying to get to grips with 'Analytics'. If [my boss] comes in wanting a 5 word anaswer to his question "what exactly is an analytic?" I think I'd still be at a loss as to how to define it.
It's a great question. Analytics (along with its sister/twin term Business Intelligence) gets thrown around without much clarity as to its meaning. You might think with the word in our name, that we'd have long ago nailed down a definition. Not so. (Although we do have a good understanding of what "Juice" means?)
Below is my take on a "map" of the analytics world.
I used a couple of dimensions to help frame all the parts and pieces:
- Purpose. A concept of "exploration vs. control" highlights the difference between analysis and reporting. Analysis is about digging deep into data to discover relationships, find causation, and describe phenomena. Reporting, in contrast, is used to track performance and identify variation from goals.
- Timing. Most analytics is backward looking -- in an attempt to understand what has happened, and therefore be equipped to make better decisions in the future. Alternatively, analytics can focus explicitly on predicting future performance or, in the a few cases, provide information to support decisions in real-time.
I'd really appreciate any comments on this map -- whether I've missed/misgrouped/misrepresented concepts or alternative dimensions to describe the space. The more clarity we can provide in describing "what is analytics" the more palatable the concept will be.
14 comments | Show all comments only the last 5 are shown
DEVI THIRUPATHI said:
Very good "map" of the analytical world. I am interested to associate the
phases of various analytical steps as shown in the map here to the
software tools that are available.
Analytic Process --> Software Tool
Is there any resource or web site that provides this.
Thanks and Regards
Devi
Zach said:
Great question. I haven't run across a good resource that lays out software tools by analytical process. We have used or checked out a bunch of different tools through the course of our work (e.g. Excel, Access, SAS, JMP, Tableau, GIS tools a-plenty, Business Objects, Cognos). Many of them can be stretched to cover different parts of this analytics landscape; few of them are very well targeted to solve a specific piece of the picture. Shoot me an e-mail if you have particular areas where you cannot find an appropriate tool.
KP said:
There is another classification to look at the analytical space
A classical 2X2 matrix with dimensions
Prescriptive vs Descriptive
Orgnl internal vs external
This kind of ties in with your dimensions
Anand said:
Very good presentation. I feel that the Target analysis and Top down analysis seems perfectly assigned to Exploration, but modelling/forecasting & scenario analysis/simulations should be shown under forecasting and not under exploration as these use the data which has been already explored.
Balaji Arun said:
Can somone suggest books on Analytics ? (preferably covering the basics of analytics)
Zach said:
I haven't come across a book that I'd consider required reading for analytics, but here are a few that may be on point:
* I've read some good stuff about this book: Hard Facts, Dangerous Half-Truths And Total Nonsense: Profiting From Evidence-Based Management
* General guidance on thinking analytically: The Thinker's Toolkit : 14 Powerful Techniques for Problem Solving
* From our friend Stephen Few: Show Me the Numbers : Designing Tables and Graphs to Enlighten
* Finally, you may just want to get more adept with Excel using a book like this: Data Analysis and Decision Making with Microsoft Excel
Anthony Arrott said:
I would add two books to the list for Balaji Arun:
for clarity of thinking:
Gerd Gigerenzer's
"Calculated Risks - How to know when numbers deceive you"
for practical applications:
Bill Jelen's
"Guerilla Data Analysis Using Microsoft Excel"
Devi Thriupathi said:
Dear Mr. Zach,
In continuation to your email reply on analytic process --> software tool. You have mentioned that there are different tools ranging from Excel, Access, SAS....., B.O., Cognos. Please mentioned the tools against the following applications:
Sales Forecasting
Account Management
Activity Based Costing
Capacity Planning
Inventory Management
Marketing
May I have your email address.
Thanks
Devi
Zach said:
My e-mail address: zach.gemignani@juiceanalytics.com
I wish there was an easy answer to your question about matching software tools to business applications...I don't think there is. Many of those items in your list require first a "business process" software application, then an associated reporting capability.
In many cases, you are talking about modeling activities. I haven't seen a better package for general purpose modeling than Excel.
When it comes to analyzing a database of customer interactions (like with Account management), we are intrigued by Tableau.
sudharshan sundarrajan said:
A pretty good diagram. I would like to add a new dimension(or maybe an implied one!) to the purpose. We normally classify analytics into 'Market analytics' and 'Risk analytics' in our organisation. Intelligent 'Market analytics' aids brilliantly in marketing and pro-active customer care. 'Risk analytics' deals with identifying potential risks, their 'riskiness' over a period of time, risk mitigation strategies and their effectivess etc. 'Risk anlaytics' is slowly moving a lot of business decisions in a lot of organisations from being affected by judgemental bias.
Mohan said:
I am not against analysis as a tool but there is far too much of analysis thinking that it will solve all buisness problems. Many a managers feel that real life business needs "Synthesis" more than analysis. All the parameteres of buisness environment can't be quantified and many important ones are soft ones or intangibles difficult to quantify. I prescribe more to Alexander Christopher's philosophy where more important than analysis is synthesis of which un-fortunately there is very little talk and even lesser training of managers. Our Management Institute has gone to the extent of even introducing a full fledged MBA i.e. Masters in Business Analysis. I am afraid too much of analysis may lead to paralysis. In the end no mathematical model can replace human decision making for which as yet no effective replacement has been found.
Deven said:
Hi Mohan,
Masters in Business Analysis sounds interesting. Can you please share more details of your Management institute?
Harry said:
Would you consider Predictive Analytics to cover any of the "risk analytics" that Sudharshan is talking about? Does it cover more than just the market side ?
Sateesh tadur said:
going by the terminology used in the Business analytics are there any statistical techniques thar used in the commercial context. I would like to know specific multivariate techniques applied in this area.
Add a comment
Zillow's challenge: precision implies accuracy
By Zach Gemignani
February 17, 2006
Find more about:
analytics
Zillow released its home value assessment tool recently. It is a tantalizing concept: they claim to have put a dollar value on over 40 million homes across the country. I rushed to the site and was satisfied with the results for my house. Then I was overjoyed to find that the new bathroom we are adding in the basement will increase our home value by $85,000. Nice! Better yet, I found that if I just add five more bathrooms, I can double the value of my house. I guess buyers would agree with me: it is nice to have a bathroom nearby when you need it.
Numbers like these have made some people suspicious. A recent article in the Washington Post criticized Zillow for its inaccuracies:
Offering automated property valuations via the Internet turns out to be much harder than it seems -- especially if you expect them to be accurate. But after running extensive tests on this ambitious national real estate service, I found it to be so inaccurate that it's not useful.
The founder, Lloyd Frink, fully acknowledges the problems, but believes more information is better. It can only help, he argues, to give people more information in the confusing home buying or selling process.
Here's the problem (one I've run into many times in the world of analytics): if you present something with precision, your audience will believe your numbers are accurate. Particularly if you are backing it up with language like:
We compute this figure by taking zillions of data points — much of this data is public — and entering them into a formula...[it] is incredibly robust and sophisticated...Hundreds of home details feed into the formula and the home characteristics are given different weights according to their influence in a given geography and over a specific period of time.
There is a related phenomenon in software development -- The Iceberg Secret -- described by Joel Spolsky:
If you show a nonprogrammer a screen which has a user interface which is 100% beautiful, they will think the program is almost done.
If the front end looks nice, most people assume everything behind the scenes works well.
I feel for the statisticians at Zillow. Creating a database with a majority of home values within 10 or 20% of reality is a monumental task. Unfortunately, even that isn't good enough. It doesn't take many wildly inaccurate estimates to undermine the credibility of the whole tool.
I'm reminded of a story passed around in the consulting business: Imagine sitting down in your seat on a flight and noticing that the seat belt sign above your head doesn't work. The fact that some little light isn't working doesn't imply there is anything wrong with the airplane's engines, navigation system or anything that truly could impact your likelihood of arriving at your destination. But that little failure can make you nervous.
9 comments | Show all comments only the last 5 are shown
precision is notthesame as accuracy said:
The difference between precision and accuracy is actually quite simple to explain using archery and a bullseye target as an illustration. If I can put all my arrows into the same point every time, I am very precise. However, I am only accurate if that point happens to be the bullseye. If not, then I am just precise but not accurate. If I am scatttered, I am neither precise nor accurate. That's all there is to it. Accuracy and precision are two different things,mutually exclusive.
Chris said:
I like the analogy.
The difference between these two concepts can be confusing (if you watch my latest screencast, I say them backwards a number of times). The point of Zach's post is that if you report numbers with high precision, you may mislead people into thinking those numbers are accurate.
For instance, imagine you come back from the archery range and say, "I put all my arrows in a three inch diameter". I might be misled to think that you are an accurate archer. Your statement doesn't guarantee that.
HomePriceMaps said:
If you checked out Zillow and weren't happy with their tax assessed "zestimates" check out <a href="http://www.HomePr




0 comments | Add a comment
said: