Annoying cyborgs attack, distort analytics [UPDATED][SOLVED-ish]

Over the last couple of weeks, I’ve been dealing with a strange phenomenon: a substantial (but not crippling) amount of traffic suddenly came our way.  The characteristics of this traffic are:

  • it’s direct (i.e. — no referrer and not search traffic)
  • it’s all from IE browsers
  • it’s nearly all to the homepage
  • it’s widely distributed in terms of geography, network etc.
  • it’s of very poor quality — low time on site, very high bounce, very low engagement
  • its real — confirmed in multiple analytics packages
  • it flies under DDos radar because it is less intense than a DDos burst, and rather indistinguishable from real traffic.

This traffic just simply started one day, and has gone up or down a little bit since.  Here’s what I’ve been able to conclude:

  • it’s likely not bot-traffic in the traditional sense.  Assets such as the javascript and ads for the page are getting loaded along with the DOM.
  • It’s likely not human either — the pattern is too uniform and the quality universally crappy.

This traffic has characteristics consistent with both bot and human behavior — I think we should call it cyborg traffic!  The pattern is consistent with a voluntary browser-net of some sort (people whoring out their OS’s to a central service — see Roger Dooley’s proposition below) or some kind of malware that is involuntarily opening windows in users’ browsers (less likely.)  If this behavior did not seem to include older IE browsers, I’d also speculate that it could be related to prerendering, but that seems unlikely given the facts.

Others have noticed it too, some positing causes:

  • This thread on webmasterworld contains lots of people reporting and reflecting on the problem
  • Roger Dooley (the fellow who started that thread) has proposed with some good evidence that the whole thing is due to a shady entity called Gomez from a company called Compuware.  Roger currently seems to be waiting to hear back from these guys — I hope he does soon, and posts the results of any conversations.
  • A post appeared on the google analytics product forums reporting the same behavior
  • A response to the webmasterworld thread by @incredibill seems to indicate that he’s found a way, via the request headers, to distinguish this sort of traffic from human traffic.  Any chance you could share Bill?

For updates on this situation, see Roger’s Post, or check back here — I’ll update when more info comes to light.

[UPDATE March 5th, 2012]

More consensus that this is a botnet, but little specific additional clarity about the nature of the traffic involved.  Good additional discussion appears here.

Someone affected soul even posted a rollup of their logs, with user agents:
https://analytics-a-googleproductforums-com.googlegroups.com/attach/5aade66b7c1d07b6/user_agents.csv?pli=1&view=1&part=4

 

[UPDATE March 7, 2012]
Here’s the first potentially reasonable mitigation I’ve come across, (from the google product group thread, above.)

“BB_CCIT” Says:

We have been getting the same kind of traffic to our homepage now for 17 days. Slow enough that it doesn’t do anything but ruin our analytics and advertising impressions.

One way that we started filtering things out was…

1) If it is an internet explorer user
2) It has no referrer (direct traffic)

If so we mark the IP on our blacklist at the bottom of our fully loaded page. If we detect a mouse movement or click event using javascript, we then update our database and mark their IP address as a verified user via an ajax call. This filtering system basically allows the bot to visit our site once and after we blacklist them any re-visits to our site will receive a 404 page for them.

Even if a blacklist were not used, one could conditionally load analytics packages in this way … I think.

Additional update:  Google seems to be investigating.  A google staffer posted:

We’re still investigating this issue and I’ll keep you posted when there are further updates. We appreciate your patience.

[UPDATE April 27, 2012]  We’ve found a workable way to exclude this stuff from Analytics. Check it out here.

4 Responses

  1. Read this post:
    https://groups.google.com/a/googleproductforums.com/d/msg/analytics/BsZ41iF2iFM/qW9nBG6M80oJ

    Looks like a DoS botnet (that has also installed FunWebProducts toolbar).

    I am waiting for ISP disable this, and AV to update.

    Note: this is not site monitoring software (gomez/siteconfidence) as it is comming from a very diverse set of IP and user-agents.

    Thanks

    Phil.

  2. Thanks for this Phil … have you identified specific ways to distinguish this traffic from normal traffic? If so please let me know — our site is still affected. What’s more, I’d like to pass any specific info of this kind on to wordpress.com engineers who may want to know.

  3. Pingback: Annoying Robots: A Solution for Google Analytics | StkyWll

Leave a Reply to Matt Cancel reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s