smehrotra

Why CDN Cache Efficiency Matters For Your Business

Blog Post created by smehrotra on Feb 12, 2016

Web applications and websites are not just another marketing avenue – they are the gateway to your brand. Customers and prospects already have an opinion about your business even before they interact with anyone from your company. Online experience can propel customers your way or altogether abandon your brand. Content Delivery Networks (CDNs) are the enablers that transparently work behind the scenes (of your web infrastructure) to make the online experience better for your end-users. But not all CDNs are created equally. Some are more efficient than others and understanding how to measure that efficiency can have a material impact on your business.

 

What’s a CDN?

CDN technology isn’t new—in fact, if you do business online, there is a high likelihood that your content is delivered by one without you even knowing about it. This technology helps your online presence expand quickly on a global scale, without the need for costly CapEx-intensive data center investments, by acting as a proxy for content requests. The CDN sits between your content origin and your end-users and “caches” content that are frequently requested. Because of this, every time the CDN serves content to the end user, you benefit in two primary ways:

  • Offloading requests—saving costs associated with serving those requests (bandwidth and origin infrastructure)
  • Latency reduction—providing performance benefits by cutting down latency of requested content

 

Without caching, the request for content (if not in browser cache) goes all the way back to the origin (which could be thousands of miles away), impacting costs while passing through congestion on the public Internet and impacting your end user’s experience.

 

What Makes One CDN Different From Another?

Sound simple enough? It is if you ignore the fact that not all CDN’s are created equally. A CDN’s underlying architecture impacts how efficient the caching mechanism is which, in turn, impacts the user’s experience. But before we dig deeper, let’s quickly refresh our understanding of cache concepts, (definitions from Wikipedia):

 

Cache - stores data so future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation, or the duplicate of data stored elsewhere.

 

Cache hit occurs when the requested data can be found in the CDN cache, while a cache miss occurs when it cannot.

 

So, based on the above definitions, a CDN can be termed more “cache efficient” if it has a higher rate of a cache hit, over time, expressed in percent (%). The higher this percentage, the better the performance (efficiency) of the CDN.

Screen Shot 2016-02-11 at 7.37.38 PM.png

Why does CDN cache efficiency matter for your business?

When your goal is to upgrade the end-user’s online experience and/or improve the cost metrics of your technology infrastructure, a CDN can play a critical role. By lowering the infrastructure cost, the company can save operational expenses and by lowering the latency, the company can benefit by having a faster online experience which in e-commerce website, for example, may translate to more completed transactions.

 

So, if the content gets served from the CDN cache, it’s a “win, win”—you saved the cost of serving content and your end-users had a great experience. But if it is a cache miss, and you have to serve the content from your origin, that impacts the end-user experience (it takes longer to fetch the content from your origin) and your costs (depending on your cache efficiency, you need to have the available bandwidth for all those requests). You may be thinking, “well, that’s not a big deal for a few thousand requests,” but if we extrapolate using some real-world data from httparchive.org, the results of flooding your origin with content requests are staggering.

 

What are the nuts and bolts of a typical web page?

Consider that a typical web page is made up of the following main components

Figure 1

* For simplicity we will ignore iframes in a page that can contain more HTML

 

You can see in Figure 1 that a lot of the content delivered through the web browser is cacheable. And as the graph below from webpageoptimization, illustrates, there are a lot of those elements on the average webpage.

 

Figure 2

 

Out of the 100 objects only the one HTML file, and maybe a few other objects are un-cacheable (i.e., they should not be stored in CDN cache). This uncacheable content could be personalized information or login details etc. For the sake of argument, let’s say 95 of those 100 objects from Figure 2 (including Videos, JavaScripts, Images and Stylesheets) are cacheable and should be served by the CDN. The remaining five will be served from the content origin.

 

Now let’s use that extrapolation as the basis to see how cache efficiency impacts Quality of Experience (QoE) and your costs.

 

By using a tool like similarweb we can approximate the number of views per month that any web page receives. For example, the web page depicted below (Figure 3) received over 440,000 page views in November. (Note: a page view is defined as the number of requests for the base HTML file from the origin. Page view information can be extracted from your server logs or your CDN’s analytics portal.)

Figure 3

 

Let’s say the cache efficiency of the CDN is 80%, based on our definition of cache efficiency, it means out of the 95 cacheable objects in a web page, only 80% of them would be served from cache—even though the remaining 20% may still be cacheable! Now, let’s simulate the impact of various cache efficiency numbers on the traffic going back to the origin.

 

While there are many other dependencies for caching—like browser behavior, cacheability settings such as Time-To-Live (TTL), HTTP bugs, etc.—our intention is to discuss the flushing due to CDN cache inefficiency! So, at 80% cache efficiency, (80% of 95) about 76 objects would be served by the CDN and the remaining 19 objects would be requested by the origin. Even though 19 objects were cacheable, they were flushed out of the CDN cache. Using 440,000 page views from Figure 3, we can estimate the total number of requests for cacheable objects going back to the content origin. (440,000 page views * number of requests going back to origin per page view) = 8360000

 

Similarly, we simulate this for various cache efficiency percent.

 

Figure 4

 

So, at 80% efficiency your content origin is serving over ~8360000 requests that should be served by the CDN and at 95% efficiency your content origin is serving ~2200000 requests. Imagine the amount of bandwidth you’d need to serve that kind of content each month!

 

What impacts the CDN cache efficiency?

A CDN’s architecture is directly responsible for how efficiently it caches content. The longer the objects/content can be retained in the cache without being flushed out, the better the CDN cache retention is. The more caching servers concentrated in a given node, the more content it can retain. For example, a CDN’s architecture can be densely architected—where hundreds or thousands of servers are available at any given network node to service requests—or it can be sparsely architected—a few servers in a lot of locations.

 

Most CDNs caching algorithm is based on storing “popular content in cache”, and as new content becomes popular older ones get flushed to make space for new content.  Limelight Networks has a densely architected metro pop system with more available cache per customer, per location. This allows the popular cached content to be retained much longer in the cache (without needing to flush less popular content). In comparison, sparsely-architected CDNs may have more number of servers but cannot retain that content for a longer time (due to lesser available cache per location) resulting in much lower cache efficiency.

 

At Limelight Networks, a typical web application delivered by our CDN has over 98% cache efficiency! We can provide those high numbers due to our densely architected network delivery nodes. Each of our points-of-presence (POPs) has hundreds, if not thousands, of servers that allow content to be retained much longer in the cache. To further enhance the cache efficiency, we use intelligent cache management that checks other servers in the path to the origin before requesting the origin for content. For example, the graph below shows cache efficiency data for one of our customers. In the month of November, the average CDN efficiency was 99.49% - that means for all the cacheable content for this customer, over 99% of the requests were served from the CDN edge.

 

Source: Limelight Control Portal

 

One of the core functions of a CDN is to make websites and web applications experiences better by storing content closer to the end users. When selecting a CDN for your business, equal importance should be given to cache efficiency of the CDN at a global scale as well as the performance metrics (latency). By doing so, you can maximize the benefits of the CDN and ultimately provide great customer experience while saving costs on origin infrastructure

Outcomes