请注意,本站并不支持低于IE8的浏览器,为了获得最佳效果,请下载最新的浏览器,推荐下载 Chrome浏览器
欢迎光临。交流群:166852192

Output Cache Improvements in Orchard 1.9


Output caching has been significantly overhauled in the upcoming Orchard 1.9 release. This posts takes an in-depth look at what precipitated these changes, how the new output cache logic works, and how to best configure and use it to improve the performance of your sites.

Background

The release of Orchard 1.9 is imminent (yes I know, it’s been “imminent” for about 5 months now, but this time it’s really imminent) and one of the contributions made by us at IDeliverable for this release is a major overhaul of the output cache logic. The modifications significantly alter the way the output cache operates, and so I wanted to describe these changes in depth once and for all so that folks out there have some place to turn to really understand how output caching works under the hood and how to best put it to use for their sites.
All ye TL;DR-type people be warned - this is a long and detailed post. Now let's dig in!
The previous output cache implementation (i.e. before 1.9) had one serious performance issue, which would typically rear its ugly head whenever the following conditions were met:
  • Your site is running with Orchard.OutputCache enabled (duh).
  • Some resource on your site (such as a page) is CPU and/or database-intensive to render.
  • The resource is output cached with a finite expiration time.
  • The resource is under heavy user load (or more technically, the average time between requests for the resource is considerably less than the time it takes to render the resource).
We discovered this issue while working with a client who experience an annual peak load during February and March. Luckily, they are proactive enough to also run annual performance tests in preparation of this peak period well in advance (a practice I would recommend anybody by the way) and it was during such load testing that we found their site kept crashing when load reached a certain level. The first symptom in their case was of course longer response times and abnormally high server CPU utilization. The second was that the ADO.NET connection pool was exhausted. The third was total denial of service.
After analyzing the issue and examining the code in Orchard.OutputCache, we realized this was happening due to the way the caching logic was designed. To understand the cause of the problems and the sequence of events leading up to an eventual crash, let’s look at a narrowed-down example involving a site with a fictitious page A:
  1. Page A contains a bunch of content and renders a bunch of menu widgets and projection widgets. Because it’s such a content-heavy page (and because Orchard is not the fastest crayon in the box) it is both CPU- and database-intensive, and takes 2 seconds to render from scratch, on an otherwise idle server.
  2. Our site is under heavy load, and page A is getting 10 requests per second.
  3. This is not a problem, because page A is currently in output cache, so the site is happily humming along and all requests are satisfied in a matter of milliseconds.
  4. Page A expires from the output cache.
  5. The next request for page A finds it missing from cache, and starts to generate it. This will take at least 2 seconds.
  6. 100 ms later the next request for page A comes in. It too finds it missing from cache, and starts to generate it. Now both requests are fighting for the same scarce CPU and database resources, and as a result, both will now likely take twice as long to complete. The estimated time until page A is once again in cache just went from 2 seconds to 4 seconds.
  7. More and more requests for page A are received, and things get worse with every one. The more requests start rendering page A, the longer they all take to complete because they compete for the same CPU and database resources, and as a result even more requests have time to arrive and start doing the same thing. Call it a feedback loop, spiraling effect, snowball effect – “a dear child has many names” as we say in Swedish – the point is, the problem exacerbates itself and it only gets worse from here.
  8. In the best case, at least one of the requests eventually succeed in rendering page A and put it back into the cache, the ones that fail do so in a graceful manner, and your site recovers (at least until the next time page A expires from the cache). In the worst case, your site crashes and burns.
Naturally, once I realized the problem I set out to fix it.

Possible Solutions

For an output caching solution to work reliably under the above conditions, there are three basic measures one might employ:
  1. Prevent multiple concurrent requests for the same resource from rendering that resource in parallel.Instead, let only the first request render the resource and make any subsequent requests block and wait until the rendering is finished. This is the most effective remedy, and for all practical intents and purposes solves the problem.
  2. Introduce a “grace time” between the expiration of a resource and the actual removal of that resource from cache. If an expired version of the resource still exists in cache, instead of blocking subsequent requests, simply serve them the stale content. This goes one step further and also improves the response time for those requests that would otherwise be blocked waiting for the first one to render the resource. It also shortens the request queue on the web server, and uses less concurrent threads. In our example scenario above, this would mean 20 less requests waiting while the resource is being rendered.
  3. Proactively “pre-cache” resources and refresh them before they have a chance to expire.
One or more of these strategies can be found in most professional-grade caching solutions, such as nginx or Varnish.
In my opinion, combining methods #1 and #2 is the best way to go for Orchard. They are relatively simple to implement because they both act within the context of existing requests, and they remove almost 100% of the problem. #3 is more complex and introduces new moving parts to the system (there needs to be some background task which renders resources independent of any incoming user requests). Additionally, for #3 alone to be effective, there needs to be a warmup period during which all resources are pre-cached before the server can start accepting incoming requests, otherwise the same problem will arise if this happens during heavy load. And besides, the only advantage that #3 brings over the other two, is that it gives a faster response time for that one guy who happens to be the first one to request an expired resource. Hardly a game-changer.
So, for Orchard 1.9 I decided (after getting approval from the steering committee of course) to implement the first two measures.

Implementation in Orchard

Considerations

When designing the new logic in Orchard, I had to consider a few challenges:
  • The storage mechanism for output cache (like most things in Orchard) is extensible and provider based. Depending on the underlying storage provider, expiration and eviction from the cache is likely done by the cache itself (by specifying an expiration policy when adding the item to the cache) rather than actively by Orchard. Therefore, in order to be able serve stale content, an item must be considered expired by Orchard (and regeneration begin) before that item actually expires on the storage level.
  • Serving from cache is done in one method, while adding rendered content to the cache is done in another. This makes it difficult to reliably hold a lock for the duration of the request (no using statements or try/finally blocks are possible). This must be carefully considered – what if the request fails in such a way that the second part never executes? That could easily lead to deadlocks if we’re not careful.
  • The time it takes to render a given item is unknown. Any introduction of pre-fetch or grace times will therefore be arbitrary. Too wide a margin and cached content will be re-rendered too often - too narrow a margin and a number of requests will have to block. Ideally the time spans involved need to be configurable in output cache settings, and whether rendering takes longer or shorter than expected, it all needs to be handled gracefully.
  • Orchard is often deployed in web farms. The cache storage might be distributed/synchronized across farm nodes, but .NET thread synchronization primitives most certainly are not. Therefore, we must either use database transactions for cross-farm synchronizations, or let each node act independently in terms of caching logic and simply accept that multiple farm nodes will most likely race to render the same content. I decided that the latter was a completely acceptable trade-off, and should be considered a benign race condition.

New Configuration

To account for the fact that rendering time might vary from one resource to another, and make grace time configurable in the output cache settings page, I added both a default grace time setting and ability to override it per individual route, just like for the duration:

As you’ll notice, you can leave the grace time column per route empty to fall back on the configured default grace time, or you may specify 0 to disable grace time altogether for that route. I’ll provide some guidance later in this post as to what you should consider when configuring these values.
To account for the fact that items are most often expired/evicted by the cache itself, there are now two datetime properties associated with a cache item:
  • ValidUntilUtc specifies the time at which the item is considered expired by Orchard. The first request for the item after this time will be tasked with regenerating it and refreshing the cache. This property is calculated as the time when the item is stored in the cache plus the configured duration for the item.
  • StoredUntilUtc specifies the time at which the item will be actually removed from the cache. This property is calculated as the ValidUntilUtc property value plus the configured grace time for the item. This is the value that is actually specified to the underlying cache storage as the expiration time; in the default storage implementation (which uses the ASP.NET cache) the cache will remove the item at this time.
Both these two values can now be seen in the Statistics tab in the output cache settings page:

New Caching Logic

Based on these new configuration and storage values, the new output caching logic performs synchronization of concurrent requests for the same resource, as well as serving stale content during the configured grace time for a given resource. Let’s take a look at how the algorithm works.
The output caching logic resides in the Orchard.OutputCache.Filters.OutputCacheFilter class in theOrchard.OutputCache module. This class is both an IActionFilter and an IResultFilter in ASP.NET MVC parlance. For the purposes of output caching, the filter does its magic in the OnActionExecuting() andOnResultExecuted() methods, respectively. This separation of logic between two methods, each executing on separate ends of the request, is what requires some extra care when managing locks.
Let’s use two diagrams to illustrate how these two methods operate, starting with the OnActionExecuting()method:

Some things to note:
  • Brown items on the diagram indicate beginning and end of the request.
  • The filter maintains a ConcurrentDictionary containing cache keys and lock objects for each cache key. These lock objects are used to synchronize concurrent requests for each cacheable item individually. Orange items on the diagram indicate critical sections, i.e. sections of the logic during which the lock for a given cache key is held by the current request.
  • The “request allowed for cache?” step performs a number of checks to ensure the request is eligible for output caching. If not, then the whole output cache logic is bypassed and the request executes as would it without output cache enabled. These checks include:
    • respecting any OutputCacheAttribute on the controllers and actions invoked
    • Not caching any POST requests
    • Not caching any admin dashboard requests
    • Not caching child actions
    • Not caching requests that are configured in output cache settings to not be cache
  • The “compute cache key” step determines a unique cache key for the resource. This key includes not only the resource identifier, but also things like the tenant name, action parameters, configured query string parameters, culture, request headers, and whether the request is authenticated or not.
  • If the cache key is found in the cache:
    • The filter checks to see if it has expired (i.e. the ValidUntilUtc value of the item has passed). If this is the case, the filter can assume the item is in its grace period (if it was passed its grace period it would have been automatically evicted from the cache and we would not have found it there in the first place). If the item hasn’t expired yet, simply send it to the client and short-circuit the rest of the request.
    • If the item has expired (i.e. is in its grace period) the filter checks to see if the lock for the cache key can be acquired. If not, then some other request is already in the process of regenerating the content, so simply send the stale cached content to the client and short-circuit the rest of the request.
    • If the lock could be acquired, the filter sets up a capture of the response and executes the rest of the request.
  • If the cache key is not found in the cache:
    • The filter tries to acquire the lock for the cache key, with a timeout of 20 seconds. This is the mechanism that causes the request to block and wait if rendering of the requested resource is already in progress, and no stale content exists. If the lock cannot be acquired within the timeout, the request is executed with output caching completely bypassed. The timeout is there primarily as a fail-safe against the theoretical possibility that some request fails to release the lock. If this happens, at least the site can continue to operate normally for requests to all other resources instead of building an infinite queue of blocking requests for the offending one.
    • If the lock could be acquired successfully, the filter rechecks the cache to see if the item is now in the cache (which would be the case if we waited for another request to render the item). If it is, the lock is released and the cached item is sent to the client.
    • If the item is still not in cache, the filter sets up a capture of the response and executes the rest of the request.
  • There are some additional twists and turns in the implementation that I have omitted from the diagram for clarity, such as the fact that a “hard refresh” from the client forces regeneration of an item regardless of its current cache status.
Now let’s look at a similar diagram of the OnResultExecuted() method to illustrate what happens after the request has been executed:

Here’s what happens:
  • Depending on what happened in OnActionExecuting() the thread may or may not hold the cache key lock at this point. The diagram assumes the former, which is why the relevant items are in orange.
  • If the response was captured (because a capture was set up in OnActionExecuting()) then the filter first checks if the response is allowed to be cached. If not, then some cache control headers are included in the response to prevent caching on proxy servers etc. This check includes:
    • not caching responses with an HTTP result code other than 200
    • not caching routes which are configured to not be cached
    • not caching the response if the request created any notification messages
  • If the response was deemed eligible for caching, it is written to the cache.
  • If the cache key lock is held by the current thread, it is released.
  • Finally, the response is sent to the client, and the request ends.

Result

These modifications mean dramatically improved scalability characteristics for Orchard sites.
After completing the implementation, we once again put that same client’s site through a gruesome round of load testing. We expected significant improvements, but quite frankly we were baffled by the result. The vendor that carries out the performance testing literally ran out of load agent capacity before we observed any noticeable impact on the Orchard site in terms of response times, CPU utilization or database query intensity.
For the most part, the site was now just happily humming along, effortlessly serving all content from cache. Once in a while, as expected, a small increase in resource use could be observed as a confirmation that some piece of content expired from cache and was being regenerated.
It’s no great mystery: the combination of blocking and grace time means that at any given time, no matter how short the expiration time of your content and no matter how many users are concurrently hammering your site,at most one of them will ever be rendering a given piece of content on your site. The rest are either waiting idle in the worst case, or served stale cached content in the best case.

Other Improvements

Aside from a lot of cleanup and refactoring of the output cache code, and a bunch of settings UI usability improvements, I also seized the opportunity to introduce a couple of other small functional improvements to the output cache module. Let’s take a look at them in this screenshot:

The labels and hints should be pretty self-explanatory. You now have the option to cache not only anonymous but also authenticated requests. You also have the option to cache different versions of rendered resources depending on whether the request was authenticated or not. This is useful on sites where pages do not contain any personal information for logged-in users, but where the rendered markup differs depending on whether the user is logged in or not.

Configuration Recommendations

So with these two configuration options (duration and grace time) now at your disposal, how should you configure them?
Well, I'll give you some recommendations based on my personal experience and preferences, but don't take them as absolute truth because all Orchard sites are different and YMMV - you need to consider the nature of your content and test the performance characteristics of your sites to make good determinations!
Let’s start with duration. This one comes down to a trade-off between how expensive your content is to render, how volatile it is (i.e. how frequently it changes) and how important it is that clients see an up-to-date version of the content. If your content is extremely static and extremely expensive to render, consider setting the duration to a very high value, such as 43,200 seconds (12 hours). If your content changes frequently or is very fast to render, consider setting the duration to a very low value, such as 30 seconds or even 15 seconds. If your content is expensive to render and changes frequently, you’re going to have to apply your judgment and make a reasonable trade-off. One good approach here is to run load tests, which can give you an indication of where the sweet spot is.
Grace time, on the other hand, comes down to how long you think it is acceptable to serve a stale (expired) version of your content. Most often this is proportional to the acceptable duration, but not always. Paradoxically though, the higher your user load is, the less likely it is that any stale cache item will remain in the cache for very long, because the next user will soon be along to request it and cause it to be regenerated, and the lower your user load is, the less useful the grace time becomes in the first place because blocking is less likely to happen anyway. As a general rule-of-thumb, if your content changes frequently then set your grace time to half the duration, otherwise if your content is highly static then set your grace time to double the duration.
Now, I realize not all your content shares the same characteristics. Unfortunately there's no way (yet) in Orchard to configure these things based on anything other than the route, which means in practice you have to pick one set of values for all your content so you're just going to have to find a reasonable compromise that works well for the majority of your content. Ideally I think the configuration ought be more granular and based on composition, so that you would be able to specify values on content types, content items, layers, widgets etc, and have them all result in a calculated duration and grace time for the final rendered page depending on which parts contributed to it. Who knows, maybe some day we'll take a stab at building such a configuration system into Orchard - if you or your company would find that valuable and are interested in co-funding such an effort, do get in touch.
The default values for a new Orchard installation is a duration of 300 seconds and a grace time of 60 seconds.
That’s it. If you made it this far, I’m impressed – you must really care about output caching! ;) And indeed you should! I had tons of fun working on these improvements, and I’m excited to see what kinds of results folks are going to see in terms of performance and scalability now that it goes into the wild and production sites start getting upgraded to Orchard 1.9. We sincerely hope this work will benefit other Orchard users out there as much as it has our clients.



作者原创内容不容易,如果觉得内容不错,请点击右侧“打赏”,赏俩给作者花花,也算是对作者付出的肯定,也可以鼓励作者原创更多更好内容。
更多详情欢迎到QQ群 166852192 交流。