Annoying cyborgs attack, distort analytics [UPDATED][SOLVED-ish]

Over the last couple of weeks, I’ve been dealing with a strange phenomenon: a substantial (but not crippling) amount of traffic suddenly came our way.  The characteristics of this traffic are:

  • it’s direct (i.e. — no referrer and not search traffic)
  • it’s all from IE browsers
  • it’s nearly all to the homepage
  • it’s widely distributed in terms of geography, network etc.
  • it’s of very poor quality — low time on site, very high bounce, very low engagement
  • its real — confirmed in multiple analytics packages
  • it flies under DDos radar because it is less intense than a DDos burst, and rather indistinguishable from real traffic.

This traffic just simply started one day, and has gone up or down a little bit since.  Here’s what I’ve been able to conclude:

  • it’s likely not bot-traffic in the traditional sense.  Assets such as the javascript and ads for the page are getting loaded along with the DOM.
  • It’s likely not human either — the pattern is too uniform and the quality universally crappy.

This traffic has characteristics consistent with both bot and human behavior — I think we should call it cyborg traffic!  The pattern is consistent with a voluntary browser-net of some sort (people whoring out their OS’s to a central service — see Roger Dooley’s proposition below) or some kind of malware that is involuntarily opening windows in users’ browsers (less likely.)  If this behavior did not seem to include older IE browsers, I’d also speculate that it could be related to prerendering, but that seems unlikely given the facts.

Others have noticed it too, some positing causes:

  • This thread on webmasterworld contains lots of people reporting and reflecting on the problem
  • Roger Dooley (the fellow who started that thread) has proposed with some good evidence that the whole thing is due to a shady entity called Gomez from a company called Compuware.  Roger currently seems to be waiting to hear back from these guys — I hope he does soon, and posts the results of any conversations.
  • A post appeared on the google analytics product forums reporting the same behavior
  • A response to the webmasterworld thread by @incredibill seems to indicate that he’s found a way, via the request headers, to distinguish this sort of traffic from human traffic.  Any chance you could share Bill?

For updates on this situation, see Roger’s Post, or check back here — I’ll update when more info comes to light.

[UPDATE March 5th, 2012]

More consensus that this is a botnet, but little specific additional clarity about the nature of the traffic involved.  Good additional discussion appears here.

Someone affected soul even posted a rollup of their logs, with user agents:
https://analytics-a-googleproductforums-com.googlegroups.com/attach/5aade66b7c1d07b6/user_agents.csv?pli=1&view=1&part=4

 

[UPDATE March 7, 2012]
Here’s the first potentially reasonable mitigation I’ve come across, (from the google product group thread, above.)

“BB_CCIT” Says:

We have been getting the same kind of traffic to our homepage now for 17 days. Slow enough that it doesn’t do anything but ruin our analytics and advertising impressions.

One way that we started filtering things out was…

1) If it is an internet explorer user
2) It has no referrer (direct traffic)

If so we mark the IP on our blacklist at the bottom of our fully loaded page. If we detect a mouse movement or click event using javascript, we then update our database and mark their IP address as a verified user via an ajax call. This filtering system basically allows the bot to visit our site once and after we blacklist them any re-visits to our site will receive a 404 page for them.

Even if a blacklist were not used, one could conditionally load analytics packages in this way … I think.

Additional update:  Google seems to be investigating.  A google staffer posted:

We’re still investigating this issue and I’ll keep you posted when there are further updates. We appreciate your patience.

[UPDATE April 27, 2012]  We’ve found a workable way to exclude this stuff from Analytics. Check it out here.


14 Comments on “Annoying cyborgs attack, distort analytics [UPDATED][SOLVED-ish]”

  1. Phil says:

    Read this post:
    https://groups.google.com/a/googleproductforums.com/d/msg/analytics/BsZ41iF2iFM/qW9nBG6M80oJ

    Looks like a DoS botnet (that has also installed FunWebProducts toolbar).

    I am waiting for ISP disable this, and AV to update.

    Note: this is not site monitoring software (gomez/siteconfidence) as it is comming from a very diverse set of IP and user-agents.

    Thanks

    Phil.

  2. Matt says:

    Thanks for this Phil … have you identified specific ways to distinguish this traffic from normal traffic? If so please let me know — our site is still affected. What’s more, I’d like to pass any specific info of this kind on to wordpress.com engineers who may want to know.

  3. [...] month I posted about a surge of illegitimate traffic we’ve experienced on Grist.  This traffic was [...]

  4. Matt says:

    I just updated this post to reflect the fact that we’ve found a pretty good way to mitigate this in the case of Google Analytics — in other words we now keep the bad impressions out of GA. Check it out here:

    http://stkywll.com/2012/04/27/annoying-robots-a-solution-for-google-analytics/

  5. [...]  The characteristics of the traffic of this botnet are suspciously similar to the sort of traffic I wrote about over a year ago when I worked at Grist.  In particular, the traffic instantiates JavaScript, identifies itself [...]

  6. These are really impressive ideas in about blogging.
    You have touched some fastidious points here. Any way keep up wrinting.

  7. Ahaa, its fastidious conversation on the topic of this post here at this weblog, I have read
    all that, so at this time me also commenting at this place.

  8. I’m extremely impressed together with your writing abilities and also with the layout for your blog. Is that this a paid subject or did you customize it your self? Anyway stay up the nice high quality writing, it is rare to peer a nice weblog like this one nowadays..

  9. kik login says:

    Hi, I do believe this is a great site. I stumbledupon it ;) I
    will revisit yet again since I book-marked it. Money and
    freedom is the best way to change, may you be rich and continue to guide other people.

  10. Thank you a bunch for sharing this with all people you really understand what you’re talking approximately! Bookmarked. Please also seek advice from my website =). We will have a hyperlink change agreement between us

  11. ktmnecro says:

    Have been experiencing this for a year to one of our URLs. We get about 15k requests per day from 1000s of IP addresses from mostly end user ISPs all over the world ( mostly us ). User agent is a mix of real windows agents. This really does seem like end user PCs being used for some automated task. Why are they hitting one of our URLs so often and for a year now beats me.
    I checked out if gomez was behind this by first calling them and seeing if they have our domain in their system. They said they did not and also told me that their “last mile” service would have a “gomez” identifier in the agent. I wanted to verify this so i downloaded and installed their client program which is called gomezpeerzone, activated it and then tcpdumped the traffic. Indeed it was used by their control to hit lots of sites, but at the tcp level all their requests did have “gomez” in user agent.
    So I’m still stumped.

    • Matt says:

      Yup that sounds exactly like what was happening to us … aside from the mix of user-agents. Do you know anything about the traffic? Does it initiate any DOM events?

      • ktmnecro says:

        It loads and executes google analytics javascript so I imagine it’s a fully functional browser.
        In our case it’s isolated to a single URL ( some older article/page ), but there were more pages experiencing this when i detect it it initially last year.
        Last year I decided to just 404 that page for everyone in the hopes that their system would go away and also i didn’t want the analytics bloat, but as I started to revisit this issue the other day, to my surprise, the traffic is still there.. about 10 requests per minute all day long.
        The gomez trail seems cold, but as i researched these gomez type “make money at home free” clients, there are 100s of them that are used to buy/trade traffic.
        I also though that maybe this is some botnet control mechanism. Your botnet nodes hit some public website URL constantly and if you need to “communicate” with your botnet, maybe reset a password or something, you leave some cryptic message in the comments of the page that your bots are scrapping. But then why the js execution.
        In anycase I started logging everything now and I’ll do some analysis in the next week.

      • Matt says:

        Hey –

        Very interesting stuff. In our case we were able to distinguish the traffic by its complete lack of DOM activity — ie: no mouse events, no key clicks etc … the solution on how to do that is here: http://stkywll.com/2012/04/27/annoying-robots-a-solution-for-google-analytics/

        My thought was this was always something to do with advertising. Do you run ads? The Chameleon botNet had a very similar character to what we experienced, and I wrote a bit about that here: http://stkywll.com/2013/03/20/annoying-robots-and-the-chameleon-botnet/


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 452 other followers