The purpose of this post is to spread knowledge about the bolts and gears of the Magento cache system among developers, and to share one method of overcoming some limitations of the file cache storage class.

This article basically started with the site of a client who was having performance issues: when the cache reached about 38000 records, he was actually forced to clear the cache to keep the site responsive enough.

How strange is that? Shouldn’t a full cache give a better performance then an empty cache?

The number of records stored in the Magento cache depends on many factors, amongst others the number of store views, if a full page cache is used or not, and the size of the catalog. Many Sites don’t go above 1000 or 2000 cache records, but for large instances much higher values are common.

The performance issues of my client only occurred on specific pages, one of the most noticeable being the add-to cart action. Dropping a product to the cart took up to 4 seconds!
Using some profiling, we could pin the issue down on part of the (full page) cache being cleared, more precisely, when the mini-cart in the page header had to be rebuilt.

Lets have a look at how the Magento caching (which uses the Zend_Cache library) works.
Each cache entry consists of the following information

  • the cached data
  • a cache key (or ID), that uniquely identifies this entry and is used to retrieve the data from the cache
  • a cache lifetime, after which the cache entry expires
  • zero or more cache tags

Most of these are obvious, but what is the purpose of cache tags? Cache tags are used to segment the cache for partial deletion. For example, you could clear all entries of the configuration cache, without touching the any entries of the HTML block cache.

On the cache management page most of the cache tags used by Magento are listed. Depending on the modules and extensions installed there could be more or less tags, e.g. CONFIG, LAYOUT_GENERAL_CACHE_TAG, BLOCK_HTML, TRANSLATIONS, FULL_PAGE_CACHE,

Listed Cache Tags - more are used internaly

Listed Cache Tags

All the information associated with the cache record is saved in the cache storage every time the method Mage::app()->saveCache() is called. And to read cache records the Method Mage::app()->loadCache() is used.
Magento offers several different options what to use as a cache storage, and each of these storage systems is used by means of a PHP class called “cache backend”.

By default, cache data is stored in files (located in the directory var/cache/). Another option is the database, which is slower then the files option mostly because the database is a heavily used resource for Magento instances anyway. Then there are storage schemes that use RAM (which are much faster), e.g. APC (shared memory) or memcached (a networked caching server).

So, lets use RAM as our cache storage, right?

Sounds good, except for one problem: APC and memcached only support storing simple key-value pairs, so the cache tags are lost! This renders the whole caching rather useless, because every time we need to clear only one part of the cache, EVERY cache entry is cleared.

But, do not despair! The Zend Framework contains a solution to this problem. There is a special cache backend called Twolevels. The Twolevels backend uses a fast cache backend (i.e. APC or memcached) for the cache data, and a slow backend (i.e. files or database) for the lifetime and the cache tag information. This way we can have the best of both worlds!

According to the excellent performance whitepaper (login required) Magento has released, it is recommended to use APC for the fast backend and files for the slow backend for single webserver setups. If you use a cluster of webservers, you should use memcached for the fast backend and the database for the slow backend. The latter has more overhead then the APC/files combo, but having all servers share one cache pool makes up for that.

Magento makes it easy to configure all this by the way – have a look at the file app/etc/local.xml.additional for further information. I will not go deeper into the setup here because we have already written about this.

Now, back to the problem at hand – why is a nice, full, cache slower then an empty one?
The site in question was using APC as the fast backend and files as the slow backend.

So, lets have a look at what is happening.
Every time a product is added to the cart, the following method is called:

Mage::app()->getCache()->clean(
    Zend_Cache::CLEANING_MODE_MATCHING_ANY_TAG,
    array($cacheTag)
);

$cacheTag is an MD5 hash built from several parameters identifying the active customers mini-cart block cache.
So, how do the twolevel and the file cache backends handle this?

In the method Zend_Cache_Backend_TwoLevels::clean() each request with the cleaning mode CLEANING_MODE_MATCHING_ANY_TAG fetches all cache IDs matching the given tags from the slow backend using the method getIdsMatchingAnyTags(), and then removes one cache entry after the other in a loop.

case Zend_Cache::CLEANING_MODE_MATCHING_ANY_TAG:
    $ids = $this->_slowBackend->getIdsMatchingAnyTags($tags);
    $res = true;
    foreach ($ids as $id) {
        $bool = $this->remove($id);
        $res = $res && $bool;
    }
    return $res;
    break;

Lets dive in a little bit deeper and find out how the Zend_Cache_Backend_File backend finds those cache IDs.
The interesting part happens in the method _get(). Here we can see a list of all cache entries is built, and then it loops over each one, reads in the corresponding meta data file (using file_get_contents() and unserialize()) and then checks if the cache entry is associated with a matching cache tag.

This following code is slightly simplified for educational purposes:

$result = array();
$glob = @glob($cacheDir . $prefix . '--*');
if ($glob !== false) {
    foreach ($glob as $file) {
        $fileName = basename($file);
        $id = $this->_fileNameToId($fileName);
        $metadatas = $this->_getMetadatas($id);
        …
       switch ($mode) {
           …
           case 'matchingAny':
               $matching = false;
               foreach ($tags as $tag) {
                   if (in_array($tag, $metadatas['tags'])) {
                       $result[] = $id;
                       break;
                   }
               }
               break;
           …
       }
    }
}

This code is probably fine for a couple of hundred cache entries, but after not even a day my clients Magento instance reached 40k records. Opening and unserializing thousands of files takes some time, even on a strong machine.

It turns out the file backend doesn’t scale well.

The remedy to this is obviously some kind of index, mapping the tags to the matching cache IDs.
Instead of writing that information into a file that would have to be parsed and updated, I decided to use the filesystem itself as the index utilizing directories and symlinks.

In the modified file cache backend developed for the client, every time a cache entry is written, a directory with the name of each tag is created and a symlink to the metadata file is placed into it. This gives a little more overhead during the write operation. There is no difference reading cache entries. But every operation matching some tags is a lot faster. And since Magento makes heavy use of cache tags, the effect is quite noticeable, depending on the number of records in the cache.  For example, adding a product to the cart now takes less then a second on the original instance. And I’m happy to say the reports of the friends who have been nice enough to test the extension have been positive for smaller sites also.

Benchmarks!

Inspired by Marc Jakubowski’s comment below, I added a little benchmark script. Here are some results to give a more detailed idea.
UPDATE: Thanks to Collin Mollenhour’s comment below, I also added benchmarks for Redis using his Zend Cache Backend Class – it’s incredibly fast, so if you have the possibility to use Redis 2.4 or newer, go for it! For this test I used the default Redis configuration, with the server running on the same machine I was running the tests. Before using it in production I would like to write some unit tests for this new backend, but it seems to work okay.

Records Tags Avg. Records / Tag Cache Backend Avg. Time for getIdsMatchingTags()
50000 250 ~700 File 20.71s
50000 250 ~700 Symlink 2.28s
50000 250 ~1500 Redis 0.01s
6000 120 ~175 File 2.41s
6000 120 ~175 Symlink 0.23s
6000 120 ~370 Redis 0.002s
2000 80 ~88 File 0.78s
2000 80 ~88 Symlink 0.08s
2000 80 ~180 Redis 0.001s

The numbers for the records and tags where chosen because they roughly correspond the average cache pool size and tags on several live Magento instances of different sizes. These benchmarks where run on my laptop, and obviously the results may be different depending on the drive, file system, system memory etc. Please go ahead and test on your system.
One interesting thing about the Redis backend is that the average number of ID’s per tag is much higher then with the other backends. I’m not sure why, my guess is that it has something to do with the way Redis manages the tag hash tables. I don’t have time to check it out at the moment, but maybe Collin already knows why.

By the way, according to the PHP reference, the symlink function is no longer limited to Unix systems, but since PHP 5.3 it is also available under Windows (Vista, Server 2008 or greater).  I haven’t tested this, though.

This module is provided with no warranty, you use it at your own risk. I know it is being used successfully on several sites, both as a primary cache backend and as a slow backend in combination with APC or memcached.
I invite you to have a look at it and try it out, but please start with a test instance and not your live store. You can download the module from Magento Connect or from github.
If you find bugs or have improvements, please send them in.

After installing the extension clear the config and block_html cache and visit System > Tools > Symlink Cache.

Symlink-Cache Utility Page
There you can see sample XML that you will have to add to your app/code/local.xml file for both variants, to use it on its own or to use it as a slow backend.
Then, after you have updated the local.xml file, clear the config cache, go to the Symlink Cache page and hit the “Initialize Tag Symlinks” button.
And that is all, your system is set up and uses the tag symlinks. Enjoy!
I would be happy to hear about other methods, ideas and experiences about improving cache performance.

Downloads

Magento Connect

or

Github

Originally published on magebase.com. Copyright © 2011 Magebase - All Rights Reserved.


Back Older article Newer article

New theme released

Responsive Magento Theme - Gala Marcos

A truly impressive Magento template for fashion store from Galathemes, Gala Marcos. It amazes visitors by modern and high-fashion look, and also, neat design.

Read more

Our services

Installation

Magento Custom Development

Magento is the most powerful eCommerce system offering rich customization possibilities by extensions and modules.

We offer custom extension development performed by our full-time Magento experts to ensure the custom extension developed follow Magento code standard, optimized and pass our quality tests.

Read more

design

Magento Custom Design

Design and development a custom Magento template for your Magento store. Our designers and developers are specialists in Magento Commerce and have strong experience in Magento projects.

We provide all design in PSD files, template package and sample data. We also help you install the theme on your store if required. We start your project instantly and with highest priority.

Read more

Magento Template Conversion

PSD to Magento Theme Conversion

PSD to Magento Theme Conversion is a leading strength of us. We have an intelligent process and experienced staff, so you will save much time.

We easily convert a store designs in PSD format into a fully functional Magento commerce template. Quick and convenient for you to create an online store based on Magento is through "PSD to Magento Theme Conversion" service. We bring the flexibility, user friendly modules, and the extensions to improve the functionality of Magento.

Read more

Development

Magento Site Development

We update our Magento knowledge everyday. Having an excellent knowledge on Magento design, Magento programming and server optimization, we guarantee your project get done perfectly.

We apply the philosophy of agile project management to ensure your project always performs on the right way, you'll get updates frequently, any changes of scope of project can be informed early to minimize risks, time and cost.

Read more

Optimization

Magento Server Optimization

We realy provide the best service for you. Among them are optimized for Magento server is very important. Your ecommerce shop will flexibility and agility absolute. Connecting with customers, processing speed, the gentle query and sensitive to the search engine is very easy

Read more