Lecture 08 Consensus, Paxos
- Objectives
- Understand what problem Paxos solves.
- Understand how (Basic) Paxos provides consistency with loss of a minority.
- Understand how Paxos can be extended to support general high-availability
systems (Multi Paxos and State Machine Replication)
Why Paxos?
Goal: Replicated Log (2)
- State machine: state with deterministic methods.
- Run same state machine on all servers for reliabilty.
- Deterministic: so feed same commands in same order to all state machines and
they'll all have identical state at each logical point in time.
Why is the log needed? Why not feed each state machine synchronously?
- Unavailable servers need to 'catch up' when they rejoin the majority.
- Easy way to create a total order of commands, also.
Walkthough:
- Client sends command to a server.
- Server records the operation in its log and uses the consensus module to
replicate the operation to the logs of the other servers.
- Once servers agree on a prefix of the log, that prefix can be fed through
the state machines.
- The state machine processes the commands and responds to the client.
Consensus module makes sure this replication happens safely.
- And works even if a minority of the nodes are down.
- 5 node work with 3 up, etc.
Failure model: fail-stop/restart, arbitrary network partitions
The Paxos Approach (3)
- Basic Paxos:
- Nodes agree on a single value.
- Note: this is not like a read/write register.
- One-time use, monotonic.
- Initially this doesn't seem that useful, so need Multi-Paxos
Requirements for Basic Paxos (4)
- Safety: never do anything bad
- Omits validity: system only chooses a value that has been proposed.
- Liveness: eventually does something good
Paxos Components (5)
Strawman: Single Acceptor (6)
Problem: Split Votes (7)
(Could probably use more detail here on why majority quorum is a good idea.)
- Q: Why a majority quorum?
- It's the smallest possible set that is guaranteed to overlap with all
others of the same size.
- That is, if two operations require majority quorum and use maximally
non-overlapping sets, they still can't miss each other's changes.
- It's also 'small enough' that it tolerates some failures while remaining
available.
- Acceptors will have to change their mind to make this work.
- Can't guarantee agreement in a single round.
- Accepted != chosen
Problem: Conflicting Choices (8)
- Two values chosen: violates safety property of single valued-ness
- Solution: servers must first 'lookaround' to see if there are other values
out there that have already been chosen. If they find one they can only
propose that value rather than their own.
- Creates a two-phase prepare/accept protocol:
- First, find chosen values,
- Second, propose a new value or the value found already chosen.
Problem: Conflicting Choices, cont'd (9)
- Still busted.
- Even if s1 and s5 look around first, they see nothing accepted or chosen.
- In the end we still end up choosing two different values again.
Need to order proposals and have acceptors reject old proposals.
- Idea: use the 'lookaround' phase to order proposals.
- 'Later' blue lookaround will be fatal to red's accepts.
Make point here: with this, once blue gets value chosen, red's accept (at s3)
is dead).
Proposal Numbers (10)
Basic Paxos (11)
- Prepare
- Forces proposer to propose any already chosen value.
- Blocks acceptors from accepting older proposals to prevent them from
becoming chosen while this one is 'in-flight'.
Basic Paxos (12)
- Starts with client call and proposer wanting to choose a value.
Basic Paxos Examples (13)
Basic Paxos Examples, cont'd (14)
- Same as previous slide
- Majorities overlap so s5 must see s3s accept.
- s5 must propose X - so consensus sticks on single value.
- Server must assume any accept it sees might be chosen, since it only issues
proposals to a majority of the cluster.
Basic Paxos Examples, cont'd (15)
- Prepare of s5 kills offs ability of s1 to get its proposal accepted.
- This time Y is chosen, not X.
Competing proposers must overlap in at least one server, so they'll always
'see' each other.
End safety discuss: does everyone feel ok with this so far?
Liveness (16)
Other Notes (17)
- Proposer might not even know!
Multi-Paxos (18)
Multi-Paxos Issues (19)
Selecting Log Entries (20)
Selecting Log Entries, cont'd (21)
Improving Efficiency (22)
Leader Election (23)
- Unlikely that there are two leaders.
- But safe even if there are.
Eliminating Prepares (24)
Idea: upon prepare for current log slot, return whether any later for this
acceptor has accepted anything into any later log slots.
- If it hasn't, then leader doesn't need to issue prepares to it anymore.
- The proposal number covers the entire log.
Q: Lamport suggests something a bit different in Paxos Made Simple.
- His is more of a range-based prepare phase.
Full Disclosure (25)
- 1/4 breaks down when leader crashes and gives up on forcing accepts
Full Disclosure (26)
Full Disclosure (27)
Client Protocol (28)
Client Protocol, cont'd (29)
- Q: Is this correct?
- Why if a client request is long delayed?
- Fix: remember all ids, make them monotonic, or require client specifies a
pair of old, new unique ids and effect only happens if that unique
transition hasn't occured before.
Configuration Changes (30)
- a limits concurrency: can't choose i + a until i is chosen
- Because we don't even know the set of servers in the cluster at i + a.
- Q: What configuration changes make sense?