Distributed Systems, CAP Theorem

A distributed system is

If you have such a system, if a server for reports goes down, the system collapses and there is a single point of failure.

One way to solve this:

We can copy of each of the servers, and if one server fails, the other can take over. Or load balancing could take place too.

Let’s say we had multiple databases where one of them takes read/write requests, and the other replicas take read requests. Let’s say we write something to the write DB, and the very next milisecond a request for read is sent to the other DB, this would remove consistency. Here, we’d have eventual consistency, ie DB being updated for read later.

There’s levels of consistency:

Strong Consistency - Data is updated instantly (For example, bank systems need it, if i have 20 dhs in the bank, and i retrieve it at 2 pm, dad shouldnt be able to do it 5 miliseconds later, db needs to update amount to 0 asap)
- This is achieved by before the response that db is updated is sent back, the other dbs get updated - Synchronous Replication
- RYW - Read Your Write - If i update something in the DB, its fine if others eventually see it, but i should be able to see it instantly (if i post on Instagram, its fine if people it some time later, but i should be able to see the post updated asap)
Eventual Consistency - Data updates eventually. (Data first updates in primary db, and then response sent, and then later rest of the dbs get updated)

Availability:

UX reduces massively if the system isn't available.

The availability of a system is measured in SLA (Service level agreement) will consist of 9s helping measure availability. As 9s increases, availability increases. Two 9s - 99% (87.6 hrs per year). 3 9s - 99.9% (8.76 hrs per year), 4 9s - 99.99% (down-time 52.6 minutes per year) (banking systems) , 5 9s - 99.999% (airline, healthcare)

How can we increase availability of systems?

Replicas of servers
Horizontal Scaling
CDNs
Load Balancing
Fail-Over mechanism (if one server goes down, have another one take over)
Monitoring
Cloud Services
Scheduled-Maintenance and Test-Case simulation for the system

Let’s say we have 2 DBs, and both of those communicate over a network, when you update data in one db and its hgoing to the other db, that could fasil due to network issues, secueity issues, firewall issues or anything else. In such a case, 1) we could either shut down the db where the update hasnt happened, in which case the abvailability goes down, or we could keep it as it is, but the consistency goes down.

In 1) i mentioned above, id have CP, since data remains consistent and partition tolerant, but availability goes down. in 2) its PA.

We can only have CA in monolith architecures, where partitions dont exist.

Theortically, you can have a CA suystem in micro, by having backup networks, but that increases infrastructure cost.

For example: Youtube comments: Availaibility is more important (you want to see comments more than wanting to see the most latest comment)

Instagram feed: Availability again (you want to see posts more than see the most latest one)

Amazon Cart: Consistency is more important (you need to see CORRECT information there)

Uber Payment: Consistency

Uber Booking: Availability (not being able to see the most latest nearest driver is fine (consistency), but u need to see.

Whatsapp: Consistency (the same chats needs to be seen everywhere, and eventual available should be okay even if chat loads 20 seconds later)

ATM: Consistency

Google Docs: Consistency