thedigitalfeed.co.uk content
thedigitalfeed.co.uk/Code
Can Google predict the future?
Posted on Sunday the 17th of June, 2007
www.thedigitalfeed.co.uk/code/2007/06/17/can-google-predict-the-future
Google posted an AdSense blog entry in March stating that most misclicks by AdSense vendors were automatically discounted:
"...chances are we've already detected your clicks on your ads and discounted them."
I was surprised to see a few noted SEOs weigh in with naive responses I have rarely seen in the SEO space.
I would guess that Google can ID your click with 95% accuracy. The other 5% of the clicks, they can throw out on pure guesstimation. They don't have to be 100% certain to be able to toss a click out with a high degree of confidence that the click was errant.
How does Google know it is you clicking on your ads?- Cookies
- Your IP matches IPs that have logged into Adsense control panel, or a login to the panel matches a previous click on your site.
- you page view behavior matches an owners page view behavior. This is by far the most common method used by Google. It is easy to ID an owner of a site after very few numbers of page views. Google simply tracks your ip behavior as you view your own site and ads are served to you. Read some of the recent stuff on click fraud - it is pretty clear this is the top way Google is tracking bad clicks.
- Additionally, the majority of IPs on the cable networks are dynamic, but dynamic within a block. Thus, it is deducible to know that if Bob's ISP is Comcast and a Comcast address has viewed 200 pages on his site and the same C block logged into his control panel, and the same D block is on the Cookie - given his path behavior - it is pretty safe bet we can throw out those clicks.
- Here is another one: lets say you are using a stock piece of blog software or blog service. Many of those pieces of software allow one template and one template only. So you serve Google ad code, to even your blog admin panel. Google sees an attempt to load an ad from a restricted url on your site - presto, it has you. The number of blind urls Google would have to check against would be less than 10 to match >90% of the major blogging software out there.
- Two words: Google Toolbar
Long story short, Google knows who you are from your click. That's not the question; the question is, even if they know it is you, how many do they let fly by without discounting them?
Everything talked about so far is child's play that any knowledgeable webmaster can duplicate. Now lets get a little more advanced.
Often overlooked is the widespread usage of Google AdSense code. That code is living on millions (perhaps billions) of pages. If you surf a lot of sites in a day, you are loading that code hundreds, to thousands of times a day. As you load it, you are leaving a trail. Every time you load that code, you are leaving information on Google's ad servers. Sooner or later, those bits of information add up into a pattern that can be used to identify you with a high degree of accuracy.
For example, if Bob starts his typical morning run by surfing:
- foosite.com news
- bigsite.com blog
- fooweather.com weather
- bobs-site.com/wordpress/
Most people do something similar. A few to a dozen of our favorite sites and pages make up your average morning run for most internet users (especially webmasters). Even if Bob switches user agents, IPs, and even some of the sequence to his daily habits, there is little doubt Google could ID Bob out of millions of users, simply from his click and advertising behavior.
Deja vu? Any of this sound familiar? It is the same type of pattern recognition search engines use to find duplicate content on websites. Every time you load that AdSense code, on any site, you are leaving a bread crumb trail of information.
Again...dig up some of what Google has talked about recently at conferences in reference to ClickBot detection, it is fascinating to see just how far Google has went at detecting users/bots/mischief.
So we have gone from basic to advanced detection. Now, lets get leading edge by looking further at heuristic methods of prediction. There is AdSense code on a few associated keyword sites, Google already knows:
- The path most users take when viewing those sites (due to tool bar and AdSense data).
- What sites most of those visitors visit.
- How often most users stay on those particular pages and sites.
- What type of advertising behavior those pages show.
- what language is on those sites.
- the income range of the audience
- the sex and the age of the audience.
- and the general the psychographic make up of the website audience, etc.
Essentially, Google knows that Bob likes roses, daisies, orchids, and wild flowers. Therefore, it is a good bet that he likes Tulips as well.
So what if Bob is on vacation in Paris:
- He visits a public Internet cafe
- He surfs a few of his favorite sites (not neccessarily any of those from his morning run, but sites that run AdSense)
- He surfs a fresh new site that he has never seen before in this space
Now, here is the fun part. Google knows it is Bob. How?
I don't know the official name for this type of predictivity, but it is a subset of Psycho-Graphic behavioral targeting (Click Prediction?).
After you dig into this line of thinking, you have to start to conclude that:
- Google is much further along the path than this.
- Google's ability to "predict" user behavior is now a thousand fold what we are talking about here.
- Google's ability to track, interpret, predict, and act upon information is now in the scary all-seeing, all knowing range.
Number three is the most interesting to me. How good has Google become at predicting events? Think about all the web data Google can synthesize.
- News tracking.
- Stock tracking around the world.
- Website tracking.
- Trend tracking.
- Event tracking.
- Gmail email reading.
- Blogs.
Imagine the associations that could be uncovered?