CodeWithMe Portland

So I’m at code with me in Portland this weekend.  Code with me is a great organization run by Tom Giratikanon and Sisi Wei that pairs technologists and journalists to spread the foundations of HTML, CSS and JavaScript.  Here’s a picture of everyone in the big  (windowless) conference room at the Oregonian.

Code With Me Portland

To understand what a dedicated group this is, you should probably also check out the current weather in Portland, which is — to say the least — spectacular.

weather

Events like this (in foul weather or fair) are really exciting.  You’d think there would be more opportunities for journalists and technologists to not only spend time together, but also trade knowledge and work together.  But really opportunities like these (open, learning, semi-structured events with lots of project time) are pretty rare, and I’m grateful to be involved this weekend.  I’m also psyched to be working with two journalists: Evonne Benedict of King5 TV in Seattle and Rachel Alexander of Whitman College and Union-Bulletin News.  Much fun!


Annoying Robots and the Chameleon Botnet

Yesterday Spider.io announced the discovery of the Chameleon Botnet, a mega-collection of infected Windows machines, based largely in the US.  The purpose of the botnet is/was to generate large amounts of distributed, hard-to-detect traffic over sites displaying certain ads (or ads from certain networks), thereby generating bogus income via a network of 202 bogus content sites and defrauding advertisers out of something like $6 million/month.  The characteristics of the traffic of this botnet are suspciously similar to the sort of traffic I wrote about over a year ago when I worked at Grist.  In particular, the traffic instantiates JavaScript, identifies itself using a particular windows user agent, and is extremely homogeneous —  which is basically what we saw at Grist.  The Chameleon traffic differs in that it evidences real but apparently random mouse and click events:

 

image:  spider.io

 Random distribution of clicks and mouse traces over a square ad on an infected site:  image: spider.io

… and apparently were confined to the part of the screen containing the ad.

We’ll probably never know if the Grist phenomenon was of this sort, but I think it might be worth developing some sort of detector for botnets of this type if the possibility exists that they are affecting more than the small number that Spider.io’s report implies are affected by Chameleon.  It would seem to me that botnets of this sort have both an incentive and a disincentive to include non-target sites on their hitlist.  The incentive is simply that by including legitimate targets they obfuscate their scheme from advertisers to some extent (though it’s unclear if most advertisers directly review the distribution patterns of their ads through networks.) The disincentive is that targeting legitimate sites carries a risk of detection, though most sites would probably not notice if this were to start happening.

This was of great interest to me (and should really be to anyone who runs a site with significant traffic) because it’s the first public announcement of a botnet capable of running a complete client stack.

I would think that some analytics and advertising platforms like Google would be interested in understanding phenomena of this type better –I’d appreciate any links or info regarding countermeasures or detection of stuff like this.


How to make EE Fast(ER)

Tonight I’m talking to the cool-ass Seattle Expression Engine meetup about how to scale EE sites.  It’s a bit of a quick tour of the subject, but I hope people find it useful.  If you want to download the slides from this (.pdf) you can do so here.  By the by, how much do I wish someone had written and add-on for EE like CE-cache a few years ago , when it was desperately needed at Grist …


A big change

For those of you who know me reasonably well, you’ll know that I really love what I do.

I love building stuff online, learning new technologies, thinking up new ways to cause change on the web.  The fact that for the last four years I’ve been able to do pretty much all of this, all the time, in my work at Grist is something that fills me with a great deal of gratitude.  Any list of people who have impacted me in my time at Grist will be incomplete (and so I’m loathe to even try) but I’d be totally remiss if I did not mention both the amazing mentors I’ve had here.  These include folks like Scott Rosenberg, Rebecca Farwell, Lori Schmall, and (going a few years back) Dean Ericksen.  And then there are the people I’ve been privileged to work with every day — Nathan Letsinger, Hanna Welch, and (more recently) Mignon Khargie and Mr. Ben Brooks.  And then there’s Chip Giller, founder and CEO of Grist, and a person whose generosity, vision and faith have benefited me in many ways.

On September 13, I will be moving on from Grist to a new and totally amazing challenge at MoveOn.org. It involves lots of work with technology, data and audience-building.  It’s a daunting, but also very exciting challenge for me, and I promise to post more about that soon.

Right at this very moment, however, I’m thinking a lot about how I can remain connected to Grist in the future — as a fan, a reader, a community member, a friend, a (small-time) donor and (if they’ll let me) interloper at the Owl and Thistle Pub.


Twitter’s new API policy in plain english

Twitter announced new API policies for version 1.1 of their API today.  The announcement was accompanied by a diagram which IMHO was bit hard to understand at first and caused a bit of useless debate and worry on Twitter.  Here’s my dumbed-down version of the diagram, or at least my understanding of it.  The x axis represents who your application is for:  the general public or, for lack of a better word, nerds (developers, business owners etc.)  The y axis represents what your application allows those people to do:  either count stuff (tweets, links etc etc.) or do stuff (tweet, search, etc etc)

Here’s Twitter’s (better) version of this:


Grist at WCSF12

This weekend the some of team Grist will be in SF for WordCamp.  I’m really excited to get to give a talk about our journey to becoming a WordPress operation.  This year (in fact, almost exactly this year, as WCSF11 represented a bit of an introduction to WordPress for us) has been a huge adventure — we learned lots about the WordPress API, moved our content and hosting to a new platform, adopted a new operating model, developed a theme and began to seriously grow our audience.

[UPDATE]

Here’s dump of the slides I used in this presentation.  Please feel free to hit me up on twitter with any questions or to talk about any of this.


What you get when you move to the cloud

What you get when you move to the cloud

Grist’s former web hardware arrived at the office today.


Annoying Robots: A Solution for Google Analytics

Last month I posted about a surge of illegitimate traffic we’ve experienced on Grist.  Given that they did things like load JavaScript, these impressions were difficult to distinguish from real traffic, except they were all from IE and all of very low quality.

A large number of people who run websites are experiencing the same problem, which is only really a problem because it can massively distort analytics (like Google Analytics for example) and also skews AdSense to a destructive degree.  While many affected folks have simply removed AdSense from the affected pages, until now, I’ve seen no report of anyone excluding the traffic from Google Analytics.

We’ve just begun testing a solution that does this, and I’d like to post about it sooner rather than later so that others may both try it out and potentially benefit from it.

The premise of this solution came from a suggestion in this thread by Darrin Ward who suggested:

1) For IE users only, serve the page with everything loaded in a JS variable and do a document.write of it only when some mouse cursor movmement takes place (GA wouldn’t execute until the doc.write).
2) Use the same principle, but only load the GA code when a mouse movement takes place.
 

While we didn’t exactly do either of these things, we did take the idea of using DOM events that are indicative of a real human (mouse movement, keystroke) to differentiate the zombie traffic from the real.  The good news is that this seems — largely — to work.  Here’s how to do it:

 
 

1.  First of all, you must be using the Google Analytics’s current (i.e. — asynchronous) method for this to make any sense.  If you’re not, you probably should be anyway, so it’s a good time to quickly switch.  Your page loads will improve if you do.

2.  We recommend as a first step that you implement some Google Analytics Events to differentiate good traffic from bad.  This will continuing tracking impressions on all page loads, but will fire off a special event that will differentiate the good traffic from the bad.  Later, once you are happy that the exclusion is happening properly, you can actually exclude impression tracking  (see below).
To do so, insert this code in your site header after the code that loads Google Analytics:

	//Evil Robot Detection

	var category = 'trafficQuality';
	var dimension = 'botDetection';
	var human_events = ['onkeydown','onmousemove'];

	if ( navigator.appName == 'Microsoft Internet Explorer' && !document.referrer) {
		for(var i = 0; i < human_events.length; i++){
			document.attachEvent(human_events[i], ourEventPushOnce);
		}
	}else{
		_gaq.push( [ '_trackEvent', category, dimension, 'botExcluded', 1, true ] );
	}

	function ourEventPushOnce(ev) {

		_gaq.push( [ '_trackEvent', category, dimension, 'on' + ev.type, 1, true ] );

		for(var i = 0; i < human_events.length; i++){
			document.detachEvent(human_events[i], ourEventPushOnce);
		}

	} // end ourEventPushOnce()

	//End Evil Robot Detection
This code causes a GA event of category “trafficQuality” and dimension “botDetection” with a label that will whenever possible contain the type of event, to be pushed to Google Analtyics.  It will also push a “botExcluded” event with this dimension and category whenever for any non-IE browser or any page view with a referrer.   This means you won’t get a Google Analytics event only when there’s a direct IE impression with no mousemove or keydown, which is what we want.
 
 

4.  So how does this help you?  Well, now in Google Analytics you’ll be able to tell the good traffic from the bad.  The good will have an event.  The bad won’t.  The easiest way to check this in Google Analytics is to check content -> events -> events overview.  Within a few hours of pushing the above code you should see events begin to accumulate there.

5.  To restore more sanity to your Google Analtyics, you could also define a goal.  (under admin go to goals and define a new goal like this:)

5.  Once you implement this goal, Google Analytics will know what traffic has achieved the goal and what hasn’t — based on this you’ve defined a conversion.  This means that on any report in Google Analytics, you can restrict the view of the report to only those visits that converted — this is done in the advanced segments menu:

6.  Note that this affects only new data that enters Google Analytics — it does not scrub old data unfortunately.  In our case, it’s restored Google Analytics to its normal self after a couple of months of frustration.

7.  Eventually, you may want to stop Google Analytics from even recording an impression in the case of bad traffic.  To do that, just remove the

_gaq.push( [ '_trackEvent', ...

lines above and replace them with

_gaq.push(['_trackPageview']);

Of course, don’t forget to remove the call to _trackPageview from it’s normal place outside the conditional.

I’d love to hear about any ideas for improvement anyone has for this.  We don’t use adSense, but in that case you could just use this technique to conditionalize the insertion of adCode into the DOM.

Good luck bot killers!

[UPDATE May 8, 2012] Added the final argument to _trackEvent to precent distortion of bounce rates. Thanks Chase!


Style Tiles

Style Tiles

Catchy name for a good idea.  ”Style Guide” always seemed to vague and formal.


How to get up and running on Amazon EC2 quickly (for OSX people)

So I needed to set up my OSX rig to access AWS, spin up and configure an Ubuntu instance, install Apache, PHP, MongoDB and do various other tasks. Good thing I found these two great resources:

Fist, here’s Robert Sosinki with a great guide on how to get set up with the EC2 command line tools on Mac OSX. Really clear and well done.

Next, here’s a quick guide from RSM on how to turn that brand new instance into a full LAMP (that’s Linux, Apache, Mongo, PHP) stack … though really you could install whatever packages you need.


Follow

Get every new post delivered to your Inbox.

Join 408 other followers