IronCache: Caching Expression Engine with memcached [UPDATED]

Update, Jan 2012:  Here’s an idea: ignore everything below and visit the IronCache page on GristLabs.  There, you can download IronCache rather than just reading about it!

I’m going to describe a general method of using memcached to implement whole-page caching for EE 1.6.x. It is somewhat complex, requires a small core change to EE and a reasonable number of configurations, and is therefore not suitable for most EE users. However, it may be of interest to anyone running a large EE site or one that is subject to intense traffic spikes. This method was suggested to us by the folks at Automattic, makers of WordPress. It was based on the Batcache plugin for wordpress, which they use in a massive way to cache their blogs at  wordpress.com.

The goal: We would like to use in-memory caching to store  ENTIRE EE PAGES (not just templates) in order to quickly serve these pages in the event of a traffic spike. We would like this caching to kick in only when necessary (ie — when a particular URL is subject to a spike) and only when the request is not associated with a logged-in session. We would like only certain types of pages to be cached (ie — not the control panel or any other page we want to be free from caching.)

Requirements: Before going any further, I’ll describe the infrastructure elements necessary for this. There are two:

1. A memcached server.
2. The PHP memcache extension.
3.  A small change to EE’s core (see below)

General Method:

We will define two types of cache objects, each of which are associated with a single URL.

COUNT: This object tracks how many times a URL has been accessed in the last N (configurable) seconds.
DATA: This object stores the the page content … it has a separate (also configurable) cache expiration time.

SESSIONS_END:

Very early in EE’s process (namely at the sessions_end hook) we check to see if several conditions are true:

- is this an anonymous session?
- does the requested URI match a defined list of patterns (configurable)
- does a non-empty (non-expired) DATA cache object exist for this URL?

If all three of these conditions are met, EE immediately fetches the complete page content from the DATA cache object, sends appropriate headers, and then the content to the browser, and terminates. This means we have a complete page displayed with very very little effort from either PHP or the database.  Yay!

However, if there is a logged in session, control is returned to the normal EE process with no changes — this means that logged in users have the same experience as always.  Also, if the URI does not match one of the defined patterns, we also proceed with vanilla EE — this prevents non-whitelisted pages from being cached.  In the case of an anonymous session and a cache-eligible URI but no (or expired) DATA object, the COUNT object is consulted. If the count is at or above a configurable threshold, then a flag is set in the $SESS->session_cache object so that the page will be cached later (more on this in a second)

Finally, whenever we encounter a cache eligible page, we increment the COUNT cache object associated with that URI.

OUTPUT_START:
note: this bit requires a small EE core change … see below.

The cache is populated at the beginning of the core.output class, just before the regular headers are sent. At this point, our extension checks for the flag in $SESS->session_cache. If it’s present, it stores the page it is about to send to the browser in the DATA object for that URI.   The result, of course, is that the next request for URI will have this cache available (subject to expiration) at sessions_end.

Components and Configuration:

This method is implemented in an extension called Ironcache, written by our team here at Grist.  A number of configurations are required:

$conf['ironcache_enable_cache'] = 'y'; //'y' means the cache is on, anything else, like 'n', mean's it not.
$conf['ironcache_cache_time'] = '300'; //in seconds -- the page cache life in seconds.
$conf['ironcache_counter_reset'] = '10'; //in seconds -- the number of seconds it takes for the counter to reset
$conf['ironcache_threshold'] = '2';  //if the page gets this number of hits in a given counter period, the page is cached.
$conf['ironcache_patterns'] = 'pattern1|pattern2|pattern3'; //a pipe-separated list of patterns for detecting cache-eligible pages
$conf['ironcache_cache_homepage'] = 'y'; // whether or not to cache '/'
$conf['ironcache_prefix'] = 'dev'; // prefix to add to cache elements.  Allows a single memcached server to be used by multiple applications without name collision
$conf['ironcache_memcache_host'] = 'localhost'; // memcached server host
$conf['ironcache_memcache_port'] = '11211'; // memcached server port

Core Change:

A small core change is required.  In core.output.php, at the beginning of the display_final_output method in core.output.php, add the following code:

// -------------------------------------------
// 'output_start' hook.
//  - override output behavior
//  - implement whole-page output caching
//
$edata = $EXT->universal_call_extension('output_start', $this);
if ($EXT->end_script === TRUE) return;
//
// -------------------------------------------

If anyone has any suggestions about how to avoid this core change, I’d love to hear them!

The Extension:

I’m planning to post a copy of the actual extension here sometime in the coming week … if you’d like a copy before then (minus some cleanup) I’d be fine emailing it to you … matt0perry [att] gmail [d0t] c0m

Ideas?  Comments?  Suggestions for Improvement?

Comment away.

Facebook Connect and Expression Engine: Two Options

It appears that EE developers now have two options when it comes to Facebook Connect. One is FabEE, developed by Purple Dogfish (UK). At Grist we’ve done quite a bit of tinkering around with this add-on and like aspects of it. (More on this in future posts I hope.) Just last week, however, Solspace released a beta version of their own Facebook Connect add-on. It looks quite promising, and with Solspace’s caché I’d expect that it turns out to be quite successful.  When it rains, it pours!

Here’s a basic feature comparison … but please check out each product for the details:

FabEE ($59.95):

- Provides SSO and linking of EE accounts with Facebook accounts.

- Batch-sends account info to Facebook application for quicker integration … hmmm.

- Possible to extend through custom plug-ins … this allows for the use of most or all of the available Facebook Connect API.

-  Implements a number of template tags, including some special conditionals.  Optionally includes JQuery 1.3.  Some examples of available tags:  {fabee:login_button}, {fabee:linked_profile_pic} etc ….

Solspace’s Facebook Connect Beta (also $59.95)

(Note:  I have not used this yet … just going off the Solspace docs:)

-  Three modes:  passive (provides SSO, and creates shell EE accounts in the background as necessary), a mode that requires EE accounts before sync and an intermediate mode where an abridged registration or sync is used.  This seems cool.

- also implements various template tags, including various facebook member data tags.  Could presumably be extended.

-  provides an account sync form tag to remove or add an account sync.

-  implements publishing items to facebook profiles.

It should be noted that both options require that your instance of EE be open to the world — ie not behind a firewall or other barrier — since Facebook will actually ping your site under certain conditions (such as if someone deauthorizes your application on Facebook.)  Also, both of these add-ons are subject to Facebook’s terms of service, which are a bit hard to understand, but include restrictions on caching data for more than 24 hours, and other restrictions, like 10 posts/account/day.

(This little post is in part due to a tidbit fed to me by the inimitable Natebot — thanks Nathan!)