Update, Jan 2012: Here’s an idea: ignore everything below and visit the IronCache page on GristLabs. There, you can download IronCache rather than just reading about it!
I’m going to describe a general method of using memcached to implement whole-page caching for EE 1.6.x. It is somewhat complex, requires a small core change to EE and a reasonable number of configurations, and is therefore not suitable for most EE users. However, it may be of interest to anyone running a large EE site or one that is subject to intense traffic spikes. This method was suggested to us by the folks at Automattic, makers of WordPress. It was based on the Batcache plugin for wordpress, which they use in a massive way to cache their blogs at wordpress.com.
The goal: We would like to use in-memory caching to store ENTIRE EE PAGES (not just templates) in order to quickly serve these pages in the event of a traffic spike. We would like this caching to kick in only when necessary (ie — when a particular URL is subject to a spike) and only when the request is not associated with a logged-in session. We would like only certain types of pages to be cached (ie — not the control panel or any other page we want to be free from caching.)
Requirements: Before going any further, I’ll describe the infrastructure elements necessary for this. There are two:
1. A memcached server.
2. The PHP memcache extension.
3. A small change to EE’s core (see below)
We will define two types of cache objects, each of which are associated with a single URL.
COUNT: This object tracks how many times a URL has been accessed in the last N (configurable) seconds.
DATA: This object stores the the page content … it has a separate (also configurable) cache expiration time.
Very early in EE’s process (namely at the sessions_end hook) we check to see if several conditions are true:
- is this an anonymous session?
- does the requested URI match a defined list of patterns (configurable)
- does a non-empty (non-expired) DATA cache object exist for this URL?
If all three of these conditions are met, EE immediately fetches the complete page content from the DATA cache object, sends appropriate headers, and then the content to the browser, and terminates. This means we have a complete page displayed with very very little effort from either PHP or the database. Yay!
However, if there is a logged in session, control is returned to the normal EE process with no changes — this means that logged in users have the same experience as always. Also, if the URI does not match one of the defined patterns, we also proceed with vanilla EE — this prevents non-whitelisted pages from being cached. In the case of an anonymous session and a cache-eligible URI but no (or expired) DATA object, the COUNT object is consulted. If the count is at or above a configurable threshold, then a flag is set in the $SESS->session_cache object so that the page will be cached later (more on this in a second)
Finally, whenever we encounter a cache eligible page, we increment the COUNT cache object associated with that URI.
note: this bit requires a small EE core change … see below.
The cache is populated at the beginning of the core.output class, just before the regular headers are sent. At this point, our extension checks for the flag in $SESS->session_cache. If it’s present, it stores the page it is about to send to the browser in the DATA object for that URI. The result, of course, is that the next request for URI will have this cache available (subject to expiration) at sessions_end.
Components and Configuration:
This method is implemented in an extension called Ironcache, written by our team here at Grist. A number of configurations are required:
$conf['ironcache_enable_cache'] = 'y'; //'y' means the cache is on, anything else, like 'n', mean's it not. $conf['ironcache_cache_time'] = '300'; //in seconds -- the page cache life in seconds. $conf['ironcache_counter_reset'] = '10'; //in seconds -- the number of seconds it takes for the counter to reset $conf['ironcache_threshold'] = '2'; //if the page gets this number of hits in a given counter period, the page is cached. $conf['ironcache_patterns'] = 'pattern1|pattern2|pattern3'; //a pipe-separated list of patterns for detecting cache-eligible pages $conf['ironcache_cache_homepage'] = 'y'; // whether or not to cache '/' $conf['ironcache_prefix'] = 'dev'; // prefix to add to cache elements. Allows a single memcached server to be used by multiple applications without name collision $conf['ironcache_memcache_host'] = 'localhost'; // memcached server host $conf['ironcache_memcache_port'] = '11211'; // memcached server port
A small core change is required. In core.output.php, at the beginning of the display_final_output method in core.output.php, add the following code:
// ------------------------------------------- // 'output_start' hook. // - override output behavior // - implement whole-page output caching // $edata = $EXT->universal_call_extension('output_start', $this); if ($EXT->end_script === TRUE) return; // // -------------------------------------------
If anyone has any suggestions about how to avoid this core change, I’d love to hear them!
I’m planning to post a copy of the actual extension here sometime in the coming week … if you’d like a copy before then (minus some cleanup) I’d be fine emailing it to you … matt0perry [att] gmail [d0t] c0m
Ideas? Comments? Suggestions for Improvement?