In building out Blueprint, we have taken an approach of “cache everything” (or what we think is most important) to make the app super responsive. Our customers love this.
What we actually cache is, essentially, a data digest used in the reporting view (so not the view or the model per-se). These digests could be a daily, weekly, monthly or annual view of a perspective on a data cube. Each day we add new, updated data to our cubes and need to expire the cached items that pertain to that day (e.g. latest week, month, year). In addition, we’re often recalculating categorizations for clients which means expiring their entire cached history and rebuilding it.
And so, we will expire 50,000 cache keys each morning as we push in new data (and then we go rebuild them). For a given site we might delete 1000-2000 keys.
Up until this week we used the
delete_matched method to remove all of the site’s keys in a single call (optimizing our redis time). However,
delete_matched relies in the redis
keys method which is slow, blocks the redis server and isn’t meant to be used in production. In fact, I could watch our redis server’s CPU go from 0.1% to 100.0% exactly during the call to expiring site caches. Because redis is mostly single threaded, pinning the CPU for processor-heavy operations holds back all the fast GET calls from our cache reads and slows the site to a crawl.
The solution was clearly we couldn’t rely on redis to tell us what keys were in the cache. Michael and I whiteboarded this and built (well, Michael mostly built) a replacement cache store that inherits from
RedisStore and effectively does three changes: stores the key name into a set (we partition sets, which is hidden in the code below) when the cache entry is written, remove the key name from the set when the cache entry is deleted, use the set to find the matched keys for
delete_matched and remove those from the cache and the set.
def delete_matched(pattern) sub_set = find_matching_keys(@set, pattern) Redis.current.srem(@set, sub_set) Redis.current.del(sub_set) end private def write_entry(key, entry) Redis.current.sadd(@set, key) super end def delete_entry(key) Redis.current.srem(@set, key) super end
I was pretty surprised not to find other solutions for this already or that the
redis-store implementation doesn’t do this automatically. However, when we implemented it, the tricks in partitioning the set (to keep that read time low) lead me to believe it’s not a generalized solution (although you could use hashing to generalize it). We also had to build a process to build up the set from the existing 3,541,239 keys in our cache, so we’ll see if that takes the weekend or just the night.