1. Skip to navigation
  2. Skip to content
  3. Skip to sidebar

We love Google Earth because it puts the power to explore data in the hands of average folks. We’ve been exploring uses for census data and wanted to share some of this data with the world.

What you’re seeing here is a map of the counties in the United States colorized by median age. Lighter colors are older.

Median age in United States by county (lighter is older)

Census data is also available at the block group level which is much, much more detailed.Median age around Detroit, Michigan by census block group (lighter is older)

Without further ado, what follows are three sets of links for each state which allow you to explore population density, median age, and male/female ratio in each state at two levels of detail. Google Earth is required. We did have some ftp issues when uploading these files, so if you have any problems, let me know and I”ll re-upload the file.

Population Density

Lighter is higher population density (white is 800+ people per square mile), Dark is lower population density (black is 2 or fewer people per square mile)

by County (overview) by Census Block Group (fine detail)
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming

Median Age

Lighter is older median age (white is 46.0 years median age), Dark is younger median age (black is 29.0 years median age)

by County (overview) by Census Block Group (fine detail)
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming

Male/Female Ratio

Lighter means more men than women (white is 55% men), Dark means more women than men (black is 45% men)

by County (overview) by Census Block Group (fine detail)
Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington West Virginia Wisconsin Wyoming

If you want to know more about Google Earth, check out our Absolutely Google Earth a collection of tools and resources to get you started.

We’re working on a project to make this and other simple mapping applications more widely available. If you’re a python guru who is interested in building great mapping applications like Chicago Crime give me a jingle.

Topics:
, ,



Yahoo recently released a nifty geocoder API that’s free for small (<50,000 lookups per day), non-commercial applications. Rasmus Lerdorf (Yahoo’s PHP king) has written a nice introduction to using this geocoder in your PHP apps. In that spirit, here’s a cheap and cheerful Python class that we use to geocode addresses.

from xml.dom.minidom import parse 
import urllib  

class Geocoder:
    """ 
    look up an location using the Yahoo geocoding api
    Requires a Yahoo appid which can be obtained at:

http://developer.yahoo.net/faq/index.html#appid

    Documentation for the Yahoo geocoding api can be found at:

http://developer.yahoo.net/maps/rest/V1/geocode.html

    """      

def init(self, appid, address_str):
    self.addressstr = addressstr         
    self.addresses = []
    self.resultcount = 0         
    parms = {’appid’: appid, ’location’: addressstr}

    try:
        url = ’http://api.local.yahoo.com/MapsService/V1/geocode?’+urllib.urlencode(parms)
        # parse the xml contents of the url into a dom
        dom = parse(urllib.urlopen(url))
        results = dom.getElementsByTagName(’Result’)
        self.result_count = len(results)
        for result in results:
            d = {’precision’: result.getAttribute(’precision’),
                ’warning’: result.getAttribute(’warning’)}

        for itm in result.childNodes:
            # if precision is zip, Address childNode will not exist

        if itm.childNodes:
            d[itm.nodeName] = itm.childNodes[0].data                     
        else:
            d[itm.nodeName] = ’’                
        self.addresses.append(d)
    except:
        raise "GeocoderError"      

def repr(self):
    s = "Original address:n%snn"%self.addressstr         
    s += "%d match(s) found:nn"%self.resultcount         
    for addr in self.addresses:
        s += """Match precision: %(precision)s
            Location: (%(Latitude)s,%(Longitude)s)
            %(Address)s
            %(City)s, %(State)s %(Zip)s
        """ % addr         
    return s

if name == "__main__":
sample_addresses = [’555 Grove St. Herndon,VA 20170’, ’1234 Greeley blvd, springfeld, va, 22152’, ’50009’]
for addr in sample_addresses:
g = Geocoder(’YahooDemo’, addr)
print ’-’*80
print g

All you need to use this is a Yahoo application id.

You now have four different ways to geocode your company’s vital address. If you have suggestions or improvements, let us know. This code is public domain.

Topics:
, ,



We spent the last couple of days working with a client on displaying data for real-time dashboards. It got me to thinking: Are there an implicit assumptions and mental habits that people bring to data interpretation? And if so—are there some basic practices to consider for visualizing data?

Which isn’t to say this is a right and perfect way to display any particular data; there is room both for creativity and structure. (Check out Information Aesthetics for examples of creative data visualization.) But in the world of management communication, it can’t hurt to be aware of your audiences’ ingrained assumptions. You want the smoothest path to your important points. The risk is in missing your tiny window to focus a frazzed executive’s mind on your point–and finding your carefully constructed analysis get sidetracked.

Here’s a starter list of these embedded assumptions:

1. Axes are often the last thing people look at in a chart. BCG Growth ShareThey expect time to progress from right to left and linear scales that start at zero. If two charts are adjacent, they will probably assume the axes and scales are the same. When it comes to the famous two-by-two consulting matrix, good things happen in the upper-right; bad things are in the lower-left. That said, I’m mystified that the famous BCG growth/share matrix’s insists on rejecting my new rule.

2. Fluff. Dressing up your display implies you aren’t comfortable with the data’s ability to stand on its own or you don’t have much to say. This can include clip art, data incorporated into pictures, and animation. USA Today is particularly good at this. Check out a couple of examples from their Snapshots section. They have less than three numbers to communicate, but fill it up with eye-catching graphics.

USA Today Snapshot 1USA Today Snapshot 2

3. Point of focus. Most data displays have a clear point of focus for the viewer, whether the presenter intends it or not. It could be the peak in a line chart, values crossing over zero, or a sudden change in values. In a chart like this (below), your intention may be to highlight the general growth trend — but you can’t avoid the inevitable questions about the drop after 2000. You can short-circuit these off-the-topic questions with an explanatory footnote or annotation. Ask yourself: what is the main point I want the reader to get, and what else will my data presentation imply?

Example graph

4. Proximity and size. Placing information close together suggests a connection. Sometimes accidental proximity can cause confusion. You might present two unrelated phenomena next to each other and the audience will automatically try to draw a connection (e.g. dogs have big teeth; teeth are good for crunchiing carrots. Audience thinks: dogs must like to crunch carrots). I just ran across Live Plasma, a great site that lets you enter a musical artist (or band, movie, director, or actor) then shows you related artists. The designers of this data visualization do a great job of building on our data display expectations by using size and proximity to show related artists.

Neil Young map

Topics:



A while back we released a collection of tools and resources for Google Earth. We’ve restructured the page a bit and added a few new links. Check out the new version and make sure to let us know if you have anything to add.

Topics:
, ,



A few days ago Zach made a nice point about Zillow. It’s oh-so-easy to produce numbers that are precise but are not accurate. Here’s a quick screencast to show you one fun way to draw the distinction in Excel using number formatting.

Click picture to view video.

Note: In the screencast, I say precision when I mean to say accuracy no fewer than *four* times. Sorry.

Topics:
, ,



Here’s video of the new analytics capabilities coming in Excel 12, including the revisions to PivotTables. Microsoft is pushing hard to weave Excel, SQL Server, and Sharepoint into an integrated system.

It’s early, but I’m concerned that analysts will have to know even more to get useful work done. Analysts would benefit from PivotTables that are easier to use rather than PivotTables that require knowledge of SQL Server, SharePoint, Unified Data Models, etc.

If you’re an analyst, check out the video and let us know if the new Excel approach would work in your organization. The video is 50 minutes long. Jump to 9 minutes in if you want to get past the intro chitter-chatter.

Topics:
,



What is analytics?

Zach Gemignani

A reader wrote to us today:

I seem to have spent the last few days (not including the week-end I must add) trying to get to grips with ’Analytics’. If [my boss] comes in wanting a 5 word anaswer to his question “what exactly is an analytic?” I think I’d still be at a loss as to how to define it.

It’s a great question. Analytics (along with its sister/twin term Business Intelligence) gets thrown around without much clarity as to its meaning. You might think with the word in our name, that we’d have long ago nailed down a definition. Not so. (Although we do have a good understanding of what “Juice” means?)

Below is my take on a “map” of the analytics world.

Map of analytics

I used a couple of dimensions to help frame all the parts and pieces:

  • Purpose. A concept of “exploration vs. control” highlights the difference between analysis and reporting. Analysis is about digging deep into data to discover relationships, find causation, and describe phenomena. Reporting, in contrast, is used to track performance and identify variation from goals.

  • Timing. Most analytics is backward looking — in an attempt to understand what has happened, and therefore be equipped to make better decisions in the future. Alternatively, analytics can focus explicitly on predicting future performance or, in the a few cases, provide information to support decisions in real-time.

I’d really appreciate any comments on this map — whether I’ve missed/misgrouped/misrepresented concepts or alternative dimensions to describe the space. The more clarity we can provide in describing “what is analytics” the more palatable the concept will be.

Topics:
,



Zillow released its home value assessment tool recently. It is a tantalizing concept: they claim to have put a dollar value on over 40 million homes across the country. I rushed to the site and was satisfied with the results for my house. Then I was overjoyed to find that the new bathroom we are adding in the basement will increase our home value by $85,000. Nice! Better yet, I found that if I just add five more bathrooms, I can double the value of my house. I guess buyers would agree with me: it is nice to have a bathroom nearby when you need it.

Numbers like these have made some people suspicious. A recent article in the Washington Post criticized Zillow for its inaccuracies:

Offering automated property valuations via the Internet turns out to be much harder than it seems — especially if you expect them to be accurate. But after running extensive tests on this ambitious national real estate service, I found it to be so inaccurate that it’s not useful.

The founder, Lloyd Frink, fully acknowledges the problems, but believes more information is better. It can only help, he argues, to give people more information in the confusing home buying or selling process.

Here’s the problem (one I’ve run into many times in the world of analytics): if you present something with precision, your audience will believe your numbers are accurate. Particularly if you are backing it up with language like:

We compute this figure by taking zillions of data points — much of this data is public — and entering them into a formula…[it] is incredibly robust and sophisticated…Hundreds of home details feed into the formula and the home characteristics are given different weights according to their influence in a given geography and over a specific period of time.

There is a related phenomenon in software development — The Iceberg Secret — described by Joel Spolsky:

If you show a nonprogrammer a screen which has a user interface which is 100% beautiful, they will think the program is almost done.

If the front end looks nice, most people assume everything behind the scenes works well.

I feel for the statisticians at Zillow. Creating a database with a majority of home values within 10 or 20% of reality is a monumental task. Unfortunately, even that isn’t good enough. It doesn’t take many wildly inaccurate estimates to undermine the credibility of the whole tool.

I’m reminded of a story passed around in the consulting business: Imagine sitting down in your seat on a flight and noticing that the seat belt sign above your head doesn’t work. The fact that some little light isn’t working doesn’t imply there is anything wrong with the airplane’s engines, navigation system or anything that truly could impact your likelihood of arriving at your destination. But that little failure can make you nervous.

Topics:



Ripped from the headlines:

To help offset gasoline prices, Budget Rent a Car is imposing an additional $9.50 charge on all vehicles driven fewer than 75 miles…”

“The new charge is aimed at renters who drive short distances and don’t fill up their tanks before they return because the gas gauge still reads “full,” even though the tank is a few gallons short. In the past, Budget filled the tank and billed the customer the highest rate. But now, Budget will impose the $9.50 charge even if the renter tops off the tank before returning the car. The charge will be removed only if customers show their gas receipt to a Budget agent, one traveler has already reported, slowing travelers often rushing to catch flights.”

“This is a convenience and time-saver for our customers,” said Susan McGowan, a spokeswoman for Cendant Corp., Budget’s parent company. “This is being done to recoup the cost of lost fuel.”

Tom Asacker’s definition of brand is “the expectation of someone or something delivering a certain feeling by way of an experience.” What feelings are Budget customers going to have about their experience? Four-letter feelings.

Budget’s mis-step here feels like analytics gone wrong–a case where a spreadsheet exercise say “go, go, go!” while any sensible person would say “stop!”. As we wrote earlier today, focusing excessively on analytics means you focus less on customer service, innovation, branding.

Topics:



Thomas Davenport published an article in Harvard Business Review entitled “Competing on Analytics.” He concludes the article with a checklist of ten key points he feels are important to creating a analytics-based business.

We disagree with quite a few of these points and even where we agree, we want add real-world nuance.

The challenge of analytics is communication and creating a shared understanding. It’s about focusing on high impact areas, moving forward one step at a time, being skeptical, being creative, searching for the truth. Any company can compete on analytics, and you certainly don’t need to satisfy a checklist to do so.

Here’s Davenport’s checklist, with Juice commentary. We’re putting together a list of practical steps anyone can take.

1. You apply sophisticated information systems and rigorous analysis not only to your core capability but also to a range of functions as varied as marketing and human resources.

Analytics is hard. Analytics takes resources. It takes effort for an organization to create and assimilate learnings from analytics. You need to focus your analytics at the key leverage points of your business. As Davenport points out in the HBR article, UPS focuses their analytics on knowing where packages are, Marriott focuses on revenue management. If you try to do everything, you won’t do anything well.

2. Your senior executive team not only recognizes the importance of analytics capabilities but also makes their development and maintenance a primary focus.

Of course analytics are good. But so is branding, innovation, operational excellence, customer focus. Companies are defined by what they don’t do just as much as what they do. If you’re going to make analytics a primary focus, you will need to make sacrifices elsewhere. Which of the above are you willing to de-emphasize?

Capital One, oft cited as the credit card king of analytics, aren’t customer service champions nor are they particularly innovative.

3. You treat fact-based decision making not only as a best practice but also as a part of the culture that’s constantly emphasized and communicated by senior executives.

This is hard to argue with. However, it’s easier said than done. In our experience, getting to a culture of decision making requires your business to have real, solid wins using analytics to make people care from top to bottom.

4. You hire not only people with analytical skills but a lot of people with the very best analytical skills—and consider them a key to your success.

The problems raised by the Mythical Man Month apply to analytics. Just as doubling the number of programmers on a project won’t halve the time it takes to complete a project, doubling the number of analysts won’t make your company twice as smart.

What you need are well placed and versatile analysts – analysts that are in constant communication and debate with key decision makers.

5. You not only employ analytics in almost every function and department but also consider it so strategically important that you manage it at the enterprise level.

What does this mean?

One thought: This refers to having a Chief (Analytics|Knowledge|Data) Officer. This may be a good idea. Here’s an interesting interview with Usama Fayyed, Yahoo’s Chief Data Officer about the value of having a chief data herder at a data intensive company.

If, on the other hand, this means centralizing analytics and building a single data warehouse, we disagree. For most companies, building a big “atomic baloney slicer” for analytics is not going to work out. These approaches take too long, are inflexible, and don’t adapt to your business.

6. You not only are expert at number crunching but also invent proprietary metrics for use in key business processes.

Why is “proprietary” a good thing? What you do want is to develop a few metrics which are core to the success of your business. If you are in a well established industry, it’s likely those metrics have been defined and are well understood. There’s a lot of value in well understood metrics that everyone in your business understands. The challenge with analytics is communication and creating a shared understanding.

7. You not only use copious data and in-house analysis but also share them with customers and suppliers.

Insight is not measured by volume. As for sharing with customers and suppliers, it’s a rare company that has evolved that far (e.g. Toyota). Focus analytics where you have the most leverage to change your business.

8. You not only avidly consume data but also seize every opportunity to generate information, creating a “test and learn” culture based on numerous small experiments.

There’s lots of ways to build insight from data. It can be test and learn, it can be customer visualization, it can be scoring systems.

9. You not only have committed to competing on analytics but also have been building your capabilities for several years.

Yes. Analytics is a learning process – a journey, not a destination. The best companies have been working on learning for a long time. You can compete on analytics without having worked on it for years. Just get started.

10. You not only emphasize the importance of analytics internally but also make quantitative capabilities part of your company’s story, to be shared in the annual report and in discussions with financial analysts.

You risk hypocricy if you follow this advice. Culture starts with internal stories. External stories will arise naturally and organically from internal stories. If you focus on external stories the best you can hope for is to find yourself in a Harvard Business Review article.

Topics: