IronCache: Caching Expression Engine with memcached [UPDATED]

Update, Jan 2012:  Here’s an idea: ignore everything below and visit the IronCache page on GristLabs.  There, you can download IronCache rather than just reading about it!

I’m going to describe a general method of using memcached to implement whole-page caching for EE 1.6.x. It is somewhat complex, requires a small core change to EE and a reasonable number of configurations, and is therefore not suitable for most EE users. However, it may be of interest to anyone running a large EE site or one that is subject to intense traffic spikes. This method was suggested to us by the folks at Automattic, makers of WordPress. It was based on the Batcache plugin for wordpress, which they use in a massive way to cache their blogs at  wordpress.com.

The goal: We would like to use in-memory caching to store  ENTIRE EE PAGES (not just templates) in order to quickly serve these pages in the event of a traffic spike. We would like this caching to kick in only when necessary (ie — when a particular URL is subject to a spike) and only when the request is not associated with a logged-in session. We would like only certain types of pages to be cached (ie — not the control panel or any other page we want to be free from caching.)

Requirements: Before going any further, I’ll describe the infrastructure elements necessary for this. There are two:

1. A memcached server.
2. The PHP memcache extension.
3.  A small change to EE’s core (see below)

General Method:

We will define two types of cache objects, each of which are associated with a single URL.

COUNT: This object tracks how many times a URL has been accessed in the last N (configurable) seconds.
DATA: This object stores the the page content … it has a separate (also configurable) cache expiration time.

SESSIONS_END:

Very early in EE’s process (namely at the sessions_end hook) we check to see if several conditions are true:

- is this an anonymous session?
- does the requested URI match a defined list of patterns (configurable)
- does a non-empty (non-expired) DATA cache object exist for this URL?

If all three of these conditions are met, EE immediately fetches the complete page content from the DATA cache object, sends appropriate headers, and then the content to the browser, and terminates. This means we have a complete page displayed with very very little effort from either PHP or the database.  Yay!

However, if there is a logged in session, control is returned to the normal EE process with no changes — this means that logged in users have the same experience as always.  Also, if the URI does not match one of the defined patterns, we also proceed with vanilla EE — this prevents non-whitelisted pages from being cached.  In the case of an anonymous session and a cache-eligible URI but no (or expired) DATA object, the COUNT object is consulted. If the count is at or above a configurable threshold, then a flag is set in the $SESS->session_cache object so that the page will be cached later (more on this in a second)

Finally, whenever we encounter a cache eligible page, we increment the COUNT cache object associated with that URI.

OUTPUT_START:
note: this bit requires a small EE core change … see below.

The cache is populated at the beginning of the core.output class, just before the regular headers are sent. At this point, our extension checks for the flag in $SESS->session_cache. If it’s present, it stores the page it is about to send to the browser in the DATA object for that URI.   The result, of course, is that the next request for URI will have this cache available (subject to expiration) at sessions_end.

Components and Configuration:

This method is implemented in an extension called Ironcache, written by our team here at Grist.  A number of configurations are required:

$conf['ironcache_enable_cache'] = 'y'; //'y' means the cache is on, anything else, like 'n', mean's it not.
$conf['ironcache_cache_time'] = '300'; //in seconds -- the page cache life in seconds.
$conf['ironcache_counter_reset'] = '10'; //in seconds -- the number of seconds it takes for the counter to reset
$conf['ironcache_threshold'] = '2';  //if the page gets this number of hits in a given counter period, the page is cached.
$conf['ironcache_patterns'] = 'pattern1|pattern2|pattern3'; //a pipe-separated list of patterns for detecting cache-eligible pages
$conf['ironcache_cache_homepage'] = 'y'; // whether or not to cache '/'
$conf['ironcache_prefix'] = 'dev'; // prefix to add to cache elements.  Allows a single memcached server to be used by multiple applications without name collision
$conf['ironcache_memcache_host'] = 'localhost'; // memcached server host
$conf['ironcache_memcache_port'] = '11211'; // memcached server port

Core Change:

A small core change is required.  In core.output.php, at the beginning of the display_final_output method in core.output.php, add the following code:

// -------------------------------------------
// 'output_start' hook.
//  - override output behavior
//  - implement whole-page output caching
//
$edata = $EXT->universal_call_extension('output_start', $this);
if ($EXT->end_script === TRUE) return;
//
// -------------------------------------------

If anyone has any suggestions about how to avoid this core change, I’d love to hear them!

The Extension:

I’m planning to post a copy of the actual extension here sometime in the coming week … if you’d like a copy before then (minus some cleanup) I’d be fine emailing it to you … matt0perry [att] gmail [d0t] c0m

Ideas?  Comments?  Suggestions for Improvement?

Comment away.

4 Responses

  1. You can do this without a core hack. The Output class is available in the sessions_start hook, so create an extension to use sessions_start, and in that method you’re calling, declare/load the current $OUT class, and assign it a new value, which will be your new custom output class that extends the original, and replaces the display_final_output method.

    class my_extension {

    function my_sessions_start(){
    global $OUT;
    $OUT = new My_Output;
    }

    }

    class My_Output extends Output {

    function display_final_output(){
    /* add your hook here */
    }

    }

    Copy the contents of that method into your new class, and add your hook. Now the 2nd hook you use in your extension can call this newly created hook. You can see this in use in my Custom Messages extension: http://brianlitzinger.com/ee/custom-system-messages

  2. Brian,

    Thanks for this useful tip. I’m cleaning up the extension now, and so I’ll include this! Really appreciate it. Have you done this sort of caching on an EE site before?

  3. No problem. Hopefully it work s out. I’ve been trying this method on a couple of other ideas, but due to how the core is setup you can’t always extend one of the core classes because it isn’t available to you early enough, or it’s redeclared at a later time and doesn’t check to see if it already exists. Fortunately, Output is one of the first things created, so you can do whatever you want with it.

    As for caching, no I haven’t done this sort of caching, but I’m going to be starting on a very large project soon, and this is the sort of thing I may need, so I’ll be keeping an eye on what the final product is.

  4. Also, the only problem with doing it this way, is that only one extension, the one with the highest priority, gets to use the hooks, because if another extension extends the same core class, it’ll get blown away with the last one called.

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s