Caching, Load Balancing

Let’s say we’re designing a trending page (Youtube). The top trending page would be the same all across India. We’d have a title, thumbnail, ytber name, etc.

Let’s say you’re running a sale on amazon.com, you want to show sale products the same to all users.

Similarly in Zerodha,

Last week(s) report, analysis, last month's report etc. This would require data processing for reports to be made.

Similarly in Swiggy,

Menus.

The solution for this - Caching. Caching is based on the Locality of preference principle. It states that anything that is requested, is likely to be requested again.

Caching is faster than DB because data is stored in Memory. High Latency in DB. Cache is a temporary storage, while DB is the source of truth. Cache works for small-medium datasets, but doesnt for large datasets due to how expensive it is.

You can cache at all 4 places above. Where you store in the cache depends on if it optimizes the design. For example, caching on DB is much lesser beneficial than the client, as the request still has to travel to the client all the way from DB.

How to decide what data is stored and is not stored in DB based on the requests the client sends?

This is decided by eviction policy.

Eviction policy says that some data needs to be thrown out of the queue of data being stored.

Workflow of cache request db - The client first checks for the response in the cache, otherwise the request is sent to DB. If data is found in the cache, it's called Cache Hit, otherwise, it's called Cache Miss.

My system would be bad if I have more cache misses and very less cache hits, as I'm constantly first checking for cache and then sending a req to the db constantly.

Eviction Policies:

  • LRU - Least Recently Used

  • LFU

  • FIFO

  • LIFO

  • MRU

  • RR

LRU:

Every time a piece of data gets used, it'll go up in the stack sort of structure, and the one that was least recently used is evicted.

LFU:

In an LFU cache eviction policy, each item in the cache is associated with a counter that tracks how many times the item has been accessed or used. When the cache is full and a new item needs to be stored, the LFU policy looks for the item with the lowest access count and evicts it to create space for the new item.

Updates in a DB:

Cache Invalidation:

Let’s say there’s a thumbnail in the cache, but u update the thumbnail in DB. you now need to invalidate the cache thumbnail, to maintain consistency in DB and Cache. We either need to make our system strongly consistent or eventually consistent.

TTL (time to live):

For eg: if it's 8 pm and TTL is 8:05 pm, until 8:05 ill get data from the cache, then the request will go to DB.

For example, if a yt thumbnail changes, you can have a TTL of like 10 mins where you see it 10 min later. Similarly, token auth in jwt gets stored for like 30 seconds.

The tradeoff here is that we might have speed, but consistency is sacrified for a bit until TTL.

Cache Validation System:

  • Write through Cache:

First update value when we want to update something in the DB like yt thumbnail, we would update in cache (write in cache) anf then write in the DB.

  • Write back cache:

We write data to cache and send responses back. And asynchronously data is sent to DB.

Advantages:

  1. Improved performance

  2. Lower Latency

  3. Higher throughput can be handled

Disadvantages:

  1. Data Integrity

  2. Not reliable (If cache goes down before DB gets updated)

  • Write Around Cache:

We write data first to DB and then while coming back to client we wreite into Cache.

Cache System Examples:

  • Redis

  • Memcached

These above are distributed Cache.

Benefits:

  • Scalable

  • High throughput possible

  • Availability increases etc.

Problem with different caches for each server?

  • If we send a request to a server to a db, if we implement cache first then to DB, we’d update data in cache associated with one server, but if load balancer sends request to another server later, its cache would have legacy information.

Hence, In a distributed system with multiple servers, all the servers will have a primary cache (global cache), which would have distribution internally.

This maintains cache coherency/consistency.

Less data redundancy.

Improved Latency. (since just 1 global cache needs to be updated)

2 famous caches used in the Industry:

Redis, Memcached

Redis:

  • In Redis, after a certain amount of time, data in the cache is taken a snapshot, and this is stored in disk. So even if the cache goes down, data is available in the disk.

  • Redis has persistent storage.

  • There is also an append-only mode, where only every update would get stored in the disk.

  • Uses write-back caching

Memcache:

  • No persistant storage

  • Data can be lost, but read operations are good.

Note: While caching can be done at any level, you'd want to do it on a level closest to the client to reduce latency.

Proxy:

Proxy essentially is an intermediary component, that acts on behalf of another entity.

ClientSide Proxy - Forward Proxy (a component that acts on behalf of the client)

ServerSide Proxy - Reverse Proxy (a component that acts on behalf of the server)

When Client interacts with server, the server recieves the request from the forward proxy, and the client recieves the response from the reverse proxy.

Forward Proxy:

  • To provide anonymity of client (Privacy) when request is sent to the server.

  • Content Filtering - Content filtering is like a smart filter for the internet. It allows you to control what kind of content is accessible to you or others on a network. With content filtering, you can block specific websites or types of content that you don't want to see or that might be inappropriate or harmful. For example, schools or workplaces might use content filtering to block social media websites, gambling sites, or explicit content to maintain a productive and safe environment.

  • Enables Caching - Caching is like keeping a copy of something handy for quicker access. When you visit a website, your browser stores some of the website's elements like images, scripts, and other resources in a temporary storage area called the cache. The next time you visit the same website, your browser can retrieve these elements from the cache instead of downloading them again from the internet. This makes the website load faster and reduces the strain on the internet connection.

Reverse Proxy:

  • Enables security (Firewalls etc)

  • Enables Load Balancing

  • Enables caching here too

  • HTTPS Handshake builds connection just to the reverse proxy instead of all the servers, improving latency - Server is offloaded from encryption load

Also, a load balancer cannot be a single point of failure because most requests coming to it are stateless.

Which server the request goes to is done using service routing:

When a request is sent to a server, the request is sent to the service directory which has the IP and port of all servers, and it responds back with the IP and port(s), and then which server within the for eg order mgmt system the request is sent to is called Load Balancing.

Load Balancing Strategies:

  1. Round Robin: Circular manner, the request goes to each server one by one. Disadvantage: We dont care about the health checks of a server. This can be improved using Weight Round Robin - Better resource allocation.

  2. Least Connections: Whichever has a lesser load gets the request. Improved using weighted least connections: Better resource allocation.

  3. IP Hashing: The same client sends all requests for a period of time to the same server. Helps with stateful requests.

  4. Content-Based: If it's video upload/download, it is a heavy operation, and this is sent according to a server that can best handle it.

  5. Least Response Time: Requests sent to the server which has the least response time.

CDN: (Content Delivery Network)

  • Helps solve Latency.

  • Helps deliver content efficiently by firstly going through Geographical Distribution.

  • Servers are placed close to end-users to deliver content efficiently. (Points of Presence)

  • Once a request is sent for the content, the first time the request is sent, the content is bought to the server closest to you. The next time someone opens that content around you, itll be quicker since it comes from a closer location.

  • This is why videos watched very less take very long to load.

Example: AkaMAI, Cloudfront, Google Cloud CDN